A new diagnostic model based on bioinformatics and machine learning has been developed to differentiate bipolar disorder (BD) from schizophrenia (SC) and major depressive disorder (MDD), according to a recent study. The model utilizes large-scale data processing, hidden pattern mining, and complex interaction identification techniques, offering promising implications for disease diagnosis and distinction.
To train the model, brain tissue datasets containing patients with SC, BD, and MDD were chosen from the Gene Expression Omnibus (GEO) database. Additionally, peripheral blood datasets were selected for validation purposes to increase sample size and study reliability. The datasets were merged and batch effects were removed to create a more comprehensive matrix for analysis.
Differential expression analysis was performed using the Linear Models for Microarray Data (Limma) technique, which identified differentially expressed genes (DEGs) between the variant comparison groups and controls. Genes with a log2 fold change (FC) greater than 1 and a p-value less than 0.05 were considered biologically significant and statistically significant, respectively.
The DEGs obtained from the single and combined datasets were cross-screened using a Venn diagram to identify genes capable of distinguishing between the three diseases. Gene function enrichment analysis was then conducted to analyze the functional pathways associated with these genes. Gene set functional enrichment analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway gene annotations.
The LASSO regression approach was utilized for variable selection and regularization to enhance the predictive power and interpretability of the model. Survival time, survival status, and gene expression data were integrated for regression analysis. Genes that could distinguish between SC and BD, as well as between BD and MDD, were identified using the lasso-cox technique.
Furthermore, the GeneMANIA database was employed to generate a protein-protein interaction network (PPI) to gain insights into gene function and prioritize genes for functional analysis. Receiver Operating Characteristic (ROC) analysis was performed to evaluate the diagnostic accuracy of the model, and an artificial neural network (ANN) was constructed for the feature genes to create a highly precise diagnostic model.
In addition to the above techniques, the CIBERSORT method was used to calculate the immune infiltrating cell scores for each sample, and immune cell infiltration analysis was conducted to investigate the correlation between the target genes and immune cells.
Overall, this diagnostic model based on bioinformatics and machine learning shows great potential for accurately differentiating BD from SC and MDD. By utilizing advanced data processing and analysis methods, researchers have made significant progress in understanding the molecular mechanisms underlying these disorders, paving the way for improved diagnosis and treatment in the future.