Comparing Machine Learning Algorithms to Predict COVID-19 Mortality Using Chest Computed Tomography Severity Score Data
Researchers at Ayatollah Talleghani Hospital in Abadan, Iran have conducted a study to compare the effectiveness of various machine learning algorithms in predicting the mortality rate of COVID-19 patients. The study, published in Scientific Reports, analyzed data from a hospital-based registry database containing information on 815 COVID-19 patients between February and December 2020.
The dataset included patient demographics, clinical features, history of personal diseases, laboratory results, and chest computed tomography (CT) severity scores. The severity of pulmonary involvement in CT images was assessed using a scoring system. Two radiologists reviewed the CT images, with any disagreements being resolved by an experienced attending radiologist.
To ensure data quality, a thorough pre-processing step was performed, which involved excluding records with more than 70% missing data and imputing missing values for continuous and discrete variables. Noisy and abnormal values were also addressed by consulting an expert panel.
The study focused on positive RT-PCR COVID-19 cases and excluded negative test results, unknown dispositions, and patients under 18 years old. After applying these criteria, the final sample size consisted of 707 cases in the survival group and 108 cases in the death group.
To address the issue of imbalanced data, the researchers used the synthetic minority over-sampling technique (SMOTE), a method that creates synthetic samples of the minority class (deceased patients) using instances of the minority class and their nearest neighbors. This technique helps to balance the dataset and prevents biased results.
To determine the most important variables for mortality prediction, the researchers employed XGBoost, random forest, and chi-squared tests. These tests identified variables such as CT severity scores, white blood cell count, and serum creatinine as strong predictors of mortality.
The study utilized eight machine learning algorithms, including decision trees, support vector machines, and logistic regression, to develop predictive models for COVID-19 mortality. The performance of these models was evaluated using metrics such as accuracy, precision, sensitivity, specificity, and area under the ROC curve (AUC).
The researchers obtained ethical approval for the study from the Abadan University of Medical Sciences and ensured patient privacy and confidentiality throughout the research process. Informed consent was obtained from all subjects or their legal guardians.
The findings of this study highlight the potential of machine learning algorithms in accurately predicting the mortality rate of COVID-19 patients. By leveraging the power of artificial intelligence and analyzing key variables, healthcare professionals can gain valuable insights that aid in making informed decisions and providing timely interventions. This research contributes to the growing body of knowledge on using machine learning in healthcare, particularly in the context of the COVID-19 pandemic.