Corrected ROC analysis for misclassified binary outcomes. (28th February 2017)
- Record Type:
- Journal Article
- Title:
- Corrected ROC analysis for misclassified binary outcomes. (28th February 2017)
- Main Title:
- Corrected ROC analysis for misclassified binary outcomes
- Authors:
- Zawistowski, Matthew
Sussman, Jeremy B.
Hofer, Timothy P.
Bentley, Douglas
Hayward, Rodney A.
Wiitala, Wyndy L. - Abstract:
- Abstract : Creating accurate risk prediction models from Big Data resources such as Electronic Health Records (EHRs) is a critical step toward achieving precision medicine. A major challenge in developing these tools is accounting for imperfect aspects of EHR data, particularly the potential for misclassified outcomes. Misclassification, the swapping of case and control outcome labels, is well known to bias effect size estimates for regression prediction models. In this paper, we study the effect of misclassification on accuracy assessment for risk prediction models and find that it leads to bias in the area under the curve (AUC) metric from standard ROC analysis. The extent of the bias is determined by the false positive and false negative misclassification rates as well as disease prevalence. Notably, we show that simply correcting for misclassification while building the prediction model is not sufficient to remove the bias in AUC. We therefore introduce an intuitive misclassification‐adjusted ROC procedure that accounts for uncertainty in observed outcomes and produces bias‐corrected estimates of the true AUC. The method requires that misclassification rates are either known or can be estimated, quantities typically required for the modeling step. The computational simplicity of our method is a key advantage, making it ideal for efficiently comparing multiple prediction models on very large datasets. Finally, we apply the correction method to a hospitalization predictionAbstract : Creating accurate risk prediction models from Big Data resources such as Electronic Health Records (EHRs) is a critical step toward achieving precision medicine. A major challenge in developing these tools is accounting for imperfect aspects of EHR data, particularly the potential for misclassified outcomes. Misclassification, the swapping of case and control outcome labels, is well known to bias effect size estimates for regression prediction models. In this paper, we study the effect of misclassification on accuracy assessment for risk prediction models and find that it leads to bias in the area under the curve (AUC) metric from standard ROC analysis. The extent of the bias is determined by the false positive and false negative misclassification rates as well as disease prevalence. Notably, we show that simply correcting for misclassification while building the prediction model is not sufficient to remove the bias in AUC. We therefore introduce an intuitive misclassification‐adjusted ROC procedure that accounts for uncertainty in observed outcomes and produces bias‐corrected estimates of the true AUC. The method requires that misclassification rates are either known or can be estimated, quantities typically required for the modeling step. The computational simplicity of our method is a key advantage, making it ideal for efficiently comparing multiple prediction models on very large datasets. Finally, we apply the correction method to a hospitalization prediction model from a cohort of over 1 million patients from the Veterans Health Administrations EHR. Implementations of the ROC correction are provided for Stata and R. Published 2017. This article is a U.S. Government work and is in the public domain in the USA … (more)
- Is Part Of:
- Statistics in medicine. Volume 36:Number 13(2017)
- Journal:
- Statistics in medicine
- Issue:
- Volume 36:Number 13(2017)
- Issue Display:
- Volume 36, Issue 13 (2017)
- Year:
- 2017
- Volume:
- 36
- Issue:
- 13
- Issue Sort Value:
- 2017-0036-0013-0000
- Page Start:
- 2148
- Page End:
- 2160
- Publication Date:
- 2017-02-28
- Subjects:
- misclassification -- ROC analysis -- risk prediction modeling -- electronic health records -- precision medicine
Medical statistics -- Periodicals
Statistique médicale -- Périodiques
Statistiques médicales -- Périodiques
610.727 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/sim.7260 ↗
- Languages:
- English
- ISSNs:
- 0277-6715
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8453.576000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2167.xml