This scientific article reports the prognosis of gynecological endometrioid adenocarcinoma with squamous differentiation (GE-ASqD). In order to build a successful machine-learning model, the researchers used datasets from the Surveillance, Epidemiology, and End Results (SEER) databases with patient information including primary site, regional nodes examined, AJCC T, N, M stage, age at diagnosis, race, sequence number, marital status, stage, surgery status, radiation status, chemotherapy status. Inclusion criteria were applied with exclusion criteria of patients aged below 18 year-old, not the primary tumor and unknown information about the cancer stage, tumor size, T, N, and M.
To analyze the data, the team applied supervised ensemble based machine-learning algorithms including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Gradient Boosting (Gbdt). The research team used the X-tile software to convert the continuous variables into categorical variables. The primary endpoint was overall survival (OS) which was calculated through the period from diagnosis to death from any cause.
The patient sample was split into a training set and a test set. Evaluation and comparison was done with the prediction accuracy of the models built by machine-learning and the area under the curve (AUC). Precisely, F1-Measure evaluation indicators such as precision rate, accuracy rate, and recall rate was used to gather relevant results. The Python programming language was used during machine learning studies for basic data processing and an R version 3.6.1 was used for statistical analyses.
The research team mentioned that the datasets analyzed are from the SEER database and are freely available for research and publishing articles thus it did not require written informed consent from patients. Furthermore, they mentioned that their study did not bring about any ethical issues or conflicts of interest as it is based on open source data.
Elevation is a machine learning technique that can be used for regression and classification problems. It produces a weak prediction model (like a decision tree) at each step and weights it into the total model. If the weak prediction model of each step generates consistent loss function gradient direction, then it is called gradient boosting (Gbdt).
In conclusion, the research team developed a classification model to efficiently stratify GE-ASqD patients based on independent prognostic factors. The report serves to provide valuable insight on the potential prognostic factors associated with the tumor.