Classifying 2-year recurrence in patients with dlbcl using clinical variables with imbalanced data and machine learning methods. (November 2020)
- Record Type:
- Journal Article
- Title:
- Classifying 2-year recurrence in patients with dlbcl using clinical variables with imbalanced data and machine learning methods. (November 2020)
- Main Title:
- Classifying 2-year recurrence in patients with dlbcl using clinical variables with imbalanced data and machine learning methods
- Authors:
- Wang, Lei
Zhao, ZhiQiang
Luo, YanHong
Yu, HongMei
Wu, ShuQing
Ren, XiaoLu
Zheng, ChuChu
Huang, XueQian - Abstract:
- Highlights: We set the complete remissions (CR) as the time node for the incidence of relapsed/refractory DLBCL, which is continuing to rise and risk-tailored early diagnostic and primary prevention strategies. Our study aims at setting the classifiers to different the patients with the relapsed/refractory DLBCL from the ones who become steady after their first reach CR and set the probability models to provide some reference for the clinicians to identify their patients at high risk. The relapsed/refractory DLBCL not only performed as the major cause of the high mortality but also cause the class imbalance between the recurrence and non-recurrence population. This might significantly reduce the accuracy of machine learning models. To deal with class-imbalance data problems, SMOTE sampling, the Cost-sensitive methods, and the ensemble learning methods are applied in the data aspect and the model aspect, respectively. We have set both classifiers and probability predicting models for 2 years recurrence hazard in DLBCL patients who first reached their CR periods. As SVM cannot provide the possibility for each sample, the platt scaling has applied to satisfy such needs. Abstract: Background: Treatments are limited for patients with relapsed/refractory Diffuse large B-cell lymphoma (DLBCL), and their survival rate is low. Prediction of the recurrence hazard for each patient could provide a reference regarding chemotherapy regimens for clinicians to extend patients' period ofHighlights: We set the complete remissions (CR) as the time node for the incidence of relapsed/refractory DLBCL, which is continuing to rise and risk-tailored early diagnostic and primary prevention strategies. Our study aims at setting the classifiers to different the patients with the relapsed/refractory DLBCL from the ones who become steady after their first reach CR and set the probability models to provide some reference for the clinicians to identify their patients at high risk. The relapsed/refractory DLBCL not only performed as the major cause of the high mortality but also cause the class imbalance between the recurrence and non-recurrence population. This might significantly reduce the accuracy of machine learning models. To deal with class-imbalance data problems, SMOTE sampling, the Cost-sensitive methods, and the ensemble learning methods are applied in the data aspect and the model aspect, respectively. We have set both classifiers and probability predicting models for 2 years recurrence hazard in DLBCL patients who first reached their CR periods. As SVM cannot provide the possibility for each sample, the platt scaling has applied to satisfy such needs. Abstract: Background: Treatments are limited for patients with relapsed/refractory Diffuse large B-cell lymphoma (DLBCL), and their survival rate is low. Prediction of the recurrence hazard for each patient could provide a reference regarding chemotherapy regimens for clinicians to extend patients' period of long-term remission. As current strategies cannot satisfy such need, we have established predictive models to classify patients with DLBCL with complete remission who had recurrences in 2 years from ones who did not. Methods: We assessed 518 patients with DLBCL and measured 52 variables of each patient. They were treated between January 2011 and July 2016. 17 variables were first selected by variable selection methods (including Lasso, Adaptive Lasso, and Elastic net). Then, we set classifiers and probability models for imbalanced data by combining the SMOTE sampling, cost-sensitive, and ensemble learning (consisting of AdaBoost, voting strategy, and Stacking) methods with the machine learning methods (Support Vector Machine, BackPropagation Artificial Neural Network, Random Forest), respectively. Last, assessed their performance. Results: The disease stage and other 5 variables are significant indicators for recurrence. The SVM with AdaBoost ensemble learning method modeling by SMOTE data performs the best (Sensitivity=97.3%, AUC=96%, RMSE=19.6%, G-mean=96%) in all classifiers. The SVM with AdaBoost method(AUC=98.7%, RMSE=17.7%, MXE=12.7%, Cal mean=3.2%, BS0=2.5%, BS1=4%, BSALL=3.1%) and random forest (AUC=99.5%, RMSE=19.8%, MXE=16.2%, Cal mean=9.1%, BS0=4.8%, BS1=2.9%, BSALL=3.9%) both modeling by SMOTE sampling data perform well in probability models. Conclusions: This predictive model has high accuracy for almost all DLBCL patients and the six indicators can be recurrence signals. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 196(2020)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 196(2020)
- Issue Display:
- Volume 196, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 196
- Issue:
- 2020
- Issue Sort Value:
- 2020-0196-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-11
- Subjects:
- Relapsed/refractory DLBCL -- Imbalanced data -- Classification and possibility prediction -- Machine learning -- Indicators
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2020.105567 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14758.xml