Predict DLBCL patients' recurrence within two years with Gaussian mixture model cluster oversampling and multi-kernel learning. (November 2022)
- Record Type:
- Journal Article
- Title:
- Predict DLBCL patients' recurrence within two years with Gaussian mixture model cluster oversampling and multi-kernel learning. (November 2022)
- Main Title:
- Predict DLBCL patients' recurrence within two years with Gaussian mixture model cluster oversampling and multi-kernel learning
- Authors:
- Xing, Meng
Zhang, Yanbo
Yu, Hongmei
Yang, Zhenhuan
Li, Xueling
Li, Qiong
Zhao, Yanlin
Zhao, Zhiqiang
Luo, Yanhong - Abstract:
- Highlights: Patients with DLBCL who relapse within two years have a worse prognosis. Therefore, we constructed a relapse prediction model to predict those patients, hoping to provide some reference for clinicians. This work constructs a secondary-level class imbalance method based on Gaussian mixture model clustering oversampling. It not only solves the imbalance between classes and within classes, but also alleviates the defect of using an algorithm to generate a large number of new samples to introduce a large number of noise samples. Using multi-kernel learning on single-view data can not only solve the linear inseparability of the original space, but also improve the utilization of the original data. Abstract: Background and objective: Diffuse large B-cell lymphoma (DLBCL) is common in adults' non-Hodgkin's lymphoma. Relapse mainly occurs within two years after diagnosis and has a poor prognosis. Relapse after two years is less frequent and has a better prognosis. In this work, we constructed a relapse prediction model for diffuse large B-cell lymphoma patients within two years, expecting to provide a reference for Clinicians to implement individualized treatment. Method: We propose a secondary-level class imbalance method based on Gaussian mixture model (GMM) clustering resampling to balance the data. Then use a multi-kernel support vector machine(SVM) to inscribe heterogeneous clinical data. Finally, merging them to identify recurrence patients within two years.Highlights: Patients with DLBCL who relapse within two years have a worse prognosis. Therefore, we constructed a relapse prediction model to predict those patients, hoping to provide some reference for clinicians. This work constructs a secondary-level class imbalance method based on Gaussian mixture model clustering oversampling. It not only solves the imbalance between classes and within classes, but also alleviates the defect of using an algorithm to generate a large number of new samples to introduce a large number of noise samples. Using multi-kernel learning on single-view data can not only solve the linear inseparability of the original space, but also improve the utilization of the original data. Abstract: Background and objective: Diffuse large B-cell lymphoma (DLBCL) is common in adults' non-Hodgkin's lymphoma. Relapse mainly occurs within two years after diagnosis and has a poor prognosis. Relapse after two years is less frequent and has a better prognosis. In this work, we constructed a relapse prediction model for diffuse large B-cell lymphoma patients within two years, expecting to provide a reference for Clinicians to implement individualized treatment. Method: We propose a secondary-level class imbalance method based on Gaussian mixture model (GMM) clustering resampling to balance the data. Then use a multi-kernel support vector machine(SVM) to inscribe heterogeneous clinical data. Finally, merging them to identify recurrence patients within two years. Results: Among all the class imbalance methods in this work, Inverse Weighted -GMM +SMOTEENN has the best performance. Compared with NO-GMM (Directl use the SMOTEENN without the GMM clustering process), its Area Under the ROC Curve(AUC) increases by 8.75%, and ECE and brier scores decrease 2.07% and 3.09%, respectively. Among the four classification algorithms in this work, Multiple kernel learning (MKL) has the most minimized brier scores and expected calibration error(ECE), the largest AUC, accuracy, Recall, precision and F1, has the best discrimination and calibration. Conclusion: Our inverse weighted -GMM+SMOTEENN+MKL (GMM-SENN-MKL) method can handle data class imbalance and clinical heterogeneity data well and can be used to predict recurrence in DLBCL patients. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 226(2022)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 226(2022)
- Issue Display:
- Volume 226, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 226
- Issue:
- 2022
- Issue Sort Value:
- 2022-0226-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-11
- Subjects:
- DLBCL -- Class imbalance -- Gaussian mixture model clustering oversampling -- Multiple kernel learning -- Recurrence prediction
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2022.107103 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24260.xml