BH-index: A predictive system based on serum biomarkers and ensemble learning for early colorectal cancer diagnosis in mass screening. (November 2021)
- Record Type:
- Journal Article
- Title:
- BH-index: A predictive system based on serum biomarkers and ensemble learning for early colorectal cancer diagnosis in mass screening. (November 2021)
- Main Title:
- BH-index: A predictive system based on serum biomarkers and ensemble learning for early colorectal cancer diagnosis in mass screening
- Authors:
- Battista, Antonio
Battista, Rosa Alessia
Battista, Federica
Iovane, Gerardo
Landi, Riccardo Emanuele - Abstract:
- Highlights: This study proposes a predictive system based on serum biomarkers and ensemble learning for early colorectal cancer diagnosis in mass screening based on 17 serum biomarkers; the system consists of a binary predictor, which predicts the presence/absence of colorectal cancer, and a staging predictor, which predicts the related TNM stage. Plasmatic proteins revealed to be significant in predicting the absence/presence and the related TNM stage of colorectal cancer in patients. Ceruloplasmin and α -2-Macroglobulin are significant in predicting the early colorectal cancer presence/absence outcome through XGBoost and Random Forest models, while CA 50 and α -2-Antitrypsin can be neglected. Extended reality allows interpreting the significance of serum biomarkers in performing early colorectal cancer diagnosis through the predictors' bias-variance ratio. Ensemble learning through majority voting permits to reduce the noise in the prediction of early colorectal cancer presence/absence outcome. Abstract: Background and objective : Colorectal cancer is one of the most common malignancies among the general population. Artificial Intelligence methodologies based on serum parameters are in continuous development to obtain less expensive tools for highly sensitive diagnoses. This study proposes a predictive system based on serum biomarkers and ensemble learning to predict colorectal cancer presence and the related TNM stage in patients. Methods : We have selected 17 significantHighlights: This study proposes a predictive system based on serum biomarkers and ensemble learning for early colorectal cancer diagnosis in mass screening based on 17 serum biomarkers; the system consists of a binary predictor, which predicts the presence/absence of colorectal cancer, and a staging predictor, which predicts the related TNM stage. Plasmatic proteins revealed to be significant in predicting the absence/presence and the related TNM stage of colorectal cancer in patients. Ceruloplasmin and α -2-Macroglobulin are significant in predicting the early colorectal cancer presence/absence outcome through XGBoost and Random Forest models, while CA 50 and α -2-Antitrypsin can be neglected. Extended reality allows interpreting the significance of serum biomarkers in performing early colorectal cancer diagnosis through the predictors' bias-variance ratio. Ensemble learning through majority voting permits to reduce the noise in the prediction of early colorectal cancer presence/absence outcome. Abstract: Background and objective : Colorectal cancer is one of the most common malignancies among the general population. Artificial Intelligence methodologies based on serum parameters are in continuous development to obtain less expensive tools for highly sensitive diagnoses. This study proposes a predictive system based on serum biomarkers and ensemble learning to predict colorectal cancer presence and the related TNM stage in patients. Methods : We have selected 17 significant plasmatic proteins, i.e., Carcinoembryonic Antigen, CA 19-9, CA 125, CA 50, CA 72-4, Tissue Polypeptide Antigen, C-Reactive Protein, Ceruloplasmin, Haptoglobin, Transferrin, Ferritin, α -1-Antitrypsin, α -2-Macroglobulin, α -1 Acid Glycoprotein, Complement C4, Complement C3, and Retinol Binding Protein, regarding 345 patients (248 affected by the neoplastic disease). The proposed system consists of two predictors, i.e., binary and staging; the former predicts the presence/absence of cancer, while the latter identifies the related TNM stage (I, II, III, or IV). The experiments were conducted by deploying and comparing Random Forest, XGBoost, Support Vector Machine, and Multilayer Perceptron with feature selection based on Gini Importance and with dimensionality reduction via PCA. Results : The results show that the system composed of XGBoost as binary and staging predictor reaches 91.30% accuracy, 90% sensitivity, and 93.33% specificity for the absence/presence outcome, while 66.66% accuracy for the staging response. With the expansion of the training set in favor of positive patients and majority voting, the system composed of the combination of Support Vector Machine, XGBoost, and Multilayer Perceptron as the binary predictor reaches 98.03% accuracy, 100% sensitivity, and 92.30% specificity, while the combination of Random Forest, XGBoost, and Multilayer Perceptron as staging predictor achieves 60% accuracy. The final system reaches, in terms of accuracy, 98.03%, and 66.66% for the binary and staging predictors, respectively. It was also found that the biomarkers which contribute most to the binary decision are Ceruloplasmin and α -2-Macroglobulin, while the least significant dimensions are CA 50 and α -1-Antitrypsin; instead, Carcinoembryonic Antigen and α -1 Acid Glycoprotein are the most significant to the staging decision. Conclusions : The present study proves the effectiveness of deploying serum biomarkers as feature dimensions for early colorectal cancer diagnosis and of using majority voting for noise reduction in the prediction. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 212(2021)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 212(2021)
- Issue Display:
- Volume 212, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 212
- Issue:
- 2021
- Issue Sort Value:
- 2021-0212-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- B-index -- Colorectal cancer diagnosis -- Tumor biomarker -- Liquid biopsy -- Artificial intelligence -- Simulation -- Ensemble learning -- Mass screening -- Machine learning -- Majority voting
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2021.106494 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 19771.xml