Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology. (2021)
- Record Type:
- Journal Article
- Title:
- Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology. (2021)
- Main Title:
- Two-Stage Approaches to Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models in Oncology
- Authors:
- Oh, Eun Jeong
Parikh, Ravi B.
Chivers, Corey
Chen, Jinbo - Abstract:
- Abstract : PURPOSE: Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. METHODS: We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20, 723 patients of 11 cancer types, where 1, 340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type–specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. RESULTS: The two-stage modeling resulted in improved calibration and discriminationAbstract : PURPOSE: Machine learning models developed from electronic health records data have been increasingly used to predict risk of mortality for general oncology patients. But these models may have suboptimal performance because of patient heterogeneity. The objective of this work is to develop a new modeling approach to predicting short-term mortality that accounts for heterogeneity across multiple subgroups in the presence of a large number of electronic health record predictors. METHODS: We proposed a two-stage approach to addressing heterogeneity among oncology patients of different cancer types for predicting their risk of mortality. Structured data were extracted from the University of Pennsylvania Health System for 20, 723 patients of 11 cancer types, where 1, 340 (6.5%) patients were deceased. We first modeled the overall risk for all patients without differentiating cancer types, as is done in the current practice. We then developed cancer type–specific models using the overall risk score as a predictor along with preselected type-specific predictors. The overall and type-specific models were compared with respect to discrimination using the area under the precision-recall curve (AUPRC) and calibration using the calibration slope. We also proposed metrics that characterize the degree of risk heterogeneity by comparing risk predictors in the overall and type-specific models. RESULTS: The two-stage modeling resulted in improved calibration and discrimination across all 11 cancer types. The improvement in AUPRC was significant for hematologic malignancies including leukemia, lymphoma, and myeloma. For instance, the AUPRC increased from 0.358 to 0.519 ([INCREMENT] = 0.161; 95% CI, 0.102 to 0.224) and from 0.299 to 0.354 ([INCREMENT] = 0.055; 95% CI, 0.009 to 0.107) for leukemia and lymphoma, respectively. For all 11 cancer types, the two-stage approach generated well-calibrated risks. A high degree of heterogeneity between type-specific and overall risk predictors was observed for most cancer types. CONCLUSION: Our two-stage modeling approach that accounts for cancer type–specific risk heterogeneity has improved calibration and discrimination than a model agnostic to cancer types. … (more)
- Is Part Of:
- JCO Clinical Cancer Informatics. Volume 5(2021)
- Journal:
- JCO Clinical Cancer Informatics
- Issue:
- Volume 5(2021)
- Issue Display:
- Volume 5, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 5
- Issue:
- 2021
- Issue Sort Value:
- 2021-0005-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021
- Subjects:
- 616.994
- Journal URLs:
- http://journals.lww.com/pages/default.aspx ↗
- DOI:
- 10.1200/CCI.21.00077 ↗
- Languages:
- English
- ISSNs:
- 2473-4276
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21252.xml