Machine Learning Interpretability Methods to Characterize the Importance of Hematologic Biomarkers in Prognosticating Patients with Suspected Infection

Dipak P. Upadhyaya; Yasir Tarabichi; Katrina Prantzalos; Salman Ayub; David C Kaelber; Satya S. Sahoo

doi:10.1101/2023.05.30.23290757

Abstract

Early detection of sepsis in patients admitted to the emergency department (ED) is an important clinical objective as early identification and treatment can help reduce morbidity and mortality rate of 20% or higher. Hematologic changes during sepsis-associated organ dysfunction are well established and a new biomarker called Monocyte Distribution Width (MDW) has been recently approved by the US Food and Drug Administration for sepsis. However, MDW, which quantifies monocyte activation in sepsis patients, is not a routinely reported parameter and it requires specialized proprietary laboratory equipment. Further, the relative importance of MDW as compared to other routinely available hematologic parameters and vital signs has not been studied, which makes it difficult for resource constrained hospital systems to make informed decisions in this regard. To address this issue, we analyzed data from a cohort of ED patients (n=10,229) admitted to a large regional safety-net hospital in Cleveland, Ohio with suspected infection who later developed poor outcomes associated with sepsis. We developed a new analytical framework consisting of seven data models and an ensemble of high accuracy machine learning (ML) algorithms (accuracy values ranging from 0.83 to 0.90) for the prediction of outcomes more common in sepsis than uncomplicated infection (3-day intensive care unit stay or death). To characterize the contributions of individual hematologic parameters, we applied the Local Interpretable Model-Agnostic Explanation (LIME) and Shapley Additive Value (SHAP) interpretability methods to the high accuracy ML algorithms. The ML interpretability results were consistent in their findings that the value of MDW is grossly attenuated in the presence of other routinely reported hematologic parameters and vital signs data. Further, this study for the first time shows that complete blood count with differential (CBC-DIFF) together with vital signs data can be used as a substitute for MDW in high accuracy ML algorithms to screen for poor outcomes associated with sepsis.

Competing Interest Statement

The authors report no conflicts of interest related to this manuscript. YT receives research funding from Beckman Coulter Inc. (Brea CA USA). Beckman Coulter Inc. played no role in the design or analysis of this study or its resultant manuscript.

Funding Statement

National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health/National Center for Advancing Translational Sciences, National Institute on Drug Abuse

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

MetroHealth hospital system institutional review board (IRB) (approval: STUDY00000097)

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

We have increased the number of subjects for data analysis and revised the findings with updated analysis.

Data Availability

The machine learning workflows and performance metrics were implemented using the Scikit libraries. The individual patient records cannot be made publicly available due to regulatory reasons. Models and data can be made available on request; however, this requires the execution of a data transfer agreement approved by the participating institutions together with an Institutional Review Board (IRB) or equivalent ethics approval for the proposed study.