Childhood asthma is a common condition that affects many children around the world. Early diagnosis and treatment are crucial for managing this condition effectively. A recent study utilized machine learning (ML) techniques to predict childhood asthma using longitudinal data, providing valuable insights for early diagnosis and treatment.
The study, known as the CHILD Study, is a large-scale longitudinal study conducted across four Canadian provinces. It collected data on family medical history, early-life clinical and environmental factors for children up to 4 years of age. The researchers utilized five different types of ML models and two sets of ensemble algorithms to predict asthma diagnosis at 5 years of age.
The primary objective of the study was to identify the earliest time point for accurate asthma prediction and determine the relative importance of each predictor over time. This information would lay the foundation for the development of a clinically applicable asthma prediction tool that can aid in the clinical identification and treatment of high-risk children.
The CHILD Cohort Study, one of the largest ongoing longitudinal birth cohort studies in Canada, provided the data for this study. It recruited pregnant women from multiple sites across the country between 2008 and 2012. Only children with complete questionnaires and physician-diagnosed asthma outcomes at age 5 were included in the study.
The dataset was split into a training dataset (85%) used for training and tuning ML models, and a holdout dataset (15%) for assessing the models’ performance. The ML models used in the study included Logistic Regression, Random Forest, eXtreme Gradient Boost, Decision Tree, and Support Vector Machine. Two sets of ensemble methods, voting and stacking algorithms, were also employed to improve the predictive performance of the individual models.
A total of 132 variables from six time points (birth, 6 months, 1 year, 2 years, 3 years, and 4 years) were used as input for each ML algorithm to predict asthma diagnosis at age 5. The dataset included parental information, children’s clinical information, and environmental information. Preprocessing steps were taken to prepare the dataset for ML algorithms, including feature engineering and imputing missing values.
To determine the subset of features that provided the best model performance, Sequential Feature Floating Selection (SFFS), an algorithmic feature selection technique, was employed. This process was conducted in a time-sequential, cumulative manner, taking into account the longitudinal importance of each feature in predicting asthma at age 5.
The performance of the ML models was evaluated using the area under the receiver-operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). These metrics were used to assess the predictive performance of the models on the holdout dataset. The study also explored the efficacy of ensemble algorithms to enhance the predictive performance of the individual models.
In conclusion, this ML study using longitudinal data from the CHILD Cohort Study provides valuable insights for the early diagnosis and treatment of childhood asthma. By identifying the earliest time point for accurate asthma prediction and determining the relative importance of each predictor over time, the study paves the way for the development of a clinically applicable asthma prediction tool. This tool has the potential to aid in the clinical identification and treatment of high-risk children, ultimately improving the management of childhood asthma.