Handling missing values in machine learning to predict patient-specific risk of adverse cardiac events: Insights from REFINE SPECT registry. (June 2022)
- Record Type:
- Journal Article
- Title:
- Handling missing values in machine learning to predict patient-specific risk of adverse cardiac events: Insights from REFINE SPECT registry. (June 2022)
- Main Title:
- Handling missing values in machine learning to predict patient-specific risk of adverse cardiac events: Insights from REFINE SPECT registry
- Authors:
- Rios, Richard
Miller, Robert J.H.
Manral, Nipun
Sharir, Tali
Einstein, Andrew J.
Fish, Mathews B.
Ruddy, Terrence D.
Kaufmann, Philipp A.
Sinusas, Albert J.
Miller, Edward J.
Bateman, Timothy M.
Dorbala, Sharmila
Di Carli, Marcelo
Van Kriekinge, Serge D.
Kavanagh, Paul B.
Parekh, Tejas
Liang, Joanna X.
Dey, Damini
Berman, Daniel S.
Slomka, Piotr J. - Abstract:
- Abstract: Background: Machine learning (ML) models can improve prediction of major adverse cardiovascular events (MACE), but in clinical practice some values may be missing. We evaluated the influence of missing values in ML models for patient-specific prediction of MACE risk. Methods: We included 20, 179 patients from the multicenter REFINE SPECT registry with MACE follow-up data. We evaluated seven methods for handling missing values: 1) removal of variables with missing values (ML-Remove), 2) imputation with median and unique category for continuous and categorical variables, respectively (ML-Traditional), 3) unique category for missing variables (ML-Unique), 4) cluster-based imputation (ML-Cluster), 5) regression-based imputation (ML-Regression), 6) missRanger imputation (ML-MR), and 7) multiple imputation (ML-MICE). We trained ML models with full data and simulated missing values in testing patients. Prediction performance was evaluated using area under the receiver-operating characteristic curve (AUC) and compared with a model without missing values (ML-All), expert visual diagnosis and total perfusion deficit (TPD). Results: During mean follow-up of 4.7 ± 1.5 years, 3, 541 patients experienced at least one MACE (3.7% annualized risk). ML-All (reference model-no missing values) had AUC 0.799 for MACE risk prediction. All seven models with missing values had lower AUC (ML-Remove: 0.778, ML-MICE: 0.774, ML-Cluster: 0.771, ML-Traditional: 0.771, ML-Regression: 0.770,Abstract: Background: Machine learning (ML) models can improve prediction of major adverse cardiovascular events (MACE), but in clinical practice some values may be missing. We evaluated the influence of missing values in ML models for patient-specific prediction of MACE risk. Methods: We included 20, 179 patients from the multicenter REFINE SPECT registry with MACE follow-up data. We evaluated seven methods for handling missing values: 1) removal of variables with missing values (ML-Remove), 2) imputation with median and unique category for continuous and categorical variables, respectively (ML-Traditional), 3) unique category for missing variables (ML-Unique), 4) cluster-based imputation (ML-Cluster), 5) regression-based imputation (ML-Regression), 6) missRanger imputation (ML-MR), and 7) multiple imputation (ML-MICE). We trained ML models with full data and simulated missing values in testing patients. Prediction performance was evaluated using area under the receiver-operating characteristic curve (AUC) and compared with a model without missing values (ML-All), expert visual diagnosis and total perfusion deficit (TPD). Results: During mean follow-up of 4.7 ± 1.5 years, 3, 541 patients experienced at least one MACE (3.7% annualized risk). ML-All (reference model-no missing values) had AUC 0.799 for MACE risk prediction. All seven models with missing values had lower AUC (ML-Remove: 0.778, ML-MICE: 0.774, ML-Cluster: 0.771, ML-Traditional: 0.771, ML-Regression: 0.770, ML-MR: 0.766, and ML-Unique: 0.766; p < 0.01 for ML-Remove vs remaining methods). Stress TPD (AUC 0.698) and visual diagnosis (0.681) had the lowest AUCs. Conclusion: Missing values reduce the accuracy of ML models when predicting MACE risk. Removing variables with missing values and retraining the model may yield superior patient-level prediction performance. Highlights: Removing variables with missing values and retraining was the most accurate method. Multiple imputation had the highest accuracy of the imputation methods. Having a selection of reduced ML models may be a practical clinical solution. … (more)
- Is Part Of:
- Computers in biology and medicine. Volume 145(2022)
- Journal:
- Computers in biology and medicine
- Issue:
- Volume 145(2022)
- Issue Display:
- Volume 145, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 145
- Issue:
- 2022
- Issue Sort Value:
- 2022-0145-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-06
- Subjects:
- Machine learning -- Clinical implementation -- Missing values -- Prognosis -- Myocardial perfusion imaging
AUC area under the receiver-operating characteristic curve -- CAD coronary artery disease -- DBP diastolic blood pressure -- ECG electrocardiogram -- MACE major adverse cardiovascular events -- ML machine learning -- MPI myocardial perfusion imaging -- SBP systolic blood pressure
Medicine -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00104825/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiomed.2022.105449 ↗
- Languages:
- English
- ISSNs:
- 0010-4825
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.880000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21569.xml