“As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”
This paper from eBiomedicine, first authored by Dr. Matthieu Komorowski, a regular AIMed faculty member, reviews the main machine learning methods deployed in sepsis that have been published to create sepsis diagnostic tools and to identify biomarkers. As he notes in the introduction, “the data-driven techniques to improve definition, early recognition, subtypes characterization, prognostication, and treatment personalization of sepsis” have included studies that have involved both the discovery and evaluation of biomarkers or digital signatures of sepsis that can contribute to the understanding of pathophysiological pathways. The studies are categorized to diagnosis, prognosis, or phenotype as well as grouped to routine clinical data and laboratory testing or non-routine data that include cytokines, metabolomics, and gene expression.
Dr. Komorowski points out that while machine learning involves both supervised and unsupervised learning, it is the latter that is less commonly deployed and yet, may be more insightful to discover new hidden patterns in high-dimensional datasets (with supervised machine learning of course providing sepsis prediction algorithms). While the supervised learning methods include the common logistic regression, decision trees and random forest, neural networks, gradient boosting, and others; in addition, principal components analysis (PCA) can reduce the dimensionality of large datasets while minimizing information loss and another methodology, t-distributed stochastic neighbor embedding (or tSNE) can also reduce dimensionality.
K-means clustering is a relatively simple unsupervised learning algorithm, while hierarchical cluster analysis and latent class and profile analysis (LCA and LPA) are also unsupervised learning methods. Both LCA and LPA, according to the authors, are more flexible as these methods can accommodate for the uncertainty of the definition of disease such as sepsis as well as somewhat adaptive time series of patient data trajectories (longitudinal clustering). Interestingly, unsupervised learning yielded sepsis phenotypes that can lead to not only further understanding of sepsis as a clinical entity but also can result in different treatment strategies.
This narrow but comprehensive and excellent review of sepsis and published machine learning models highlights the necessary strong movement towards more accurate characterization of a heterogenous disease and trajectory. The authors finally conclude by recognizing the challenge and urgency of correct sepsis labeling as well as revealing that the vast majority of these models lack prospective validation.