Testing calibration of phenotyping models using positive-only electronic health record data. (22nd February 2021)
- Record Type:
- Journal Article
- Title:
- Testing calibration of phenotyping models using positive-only electronic health record data. (22nd February 2021)
- Main Title:
- Testing calibration of phenotyping models using positive-only electronic health record data
- Authors:
- Zhang, Lingjiao
Ma, Yanyuan
Herman, Daniel
Chen, Jinbo - Abstract:
- Summary: Validation of phenotyping models using Electronic Health Records (EHRs) data conventionally requires gold-standard case and control labels. The labeling process requires clinical experts to retrospectively review patients' medical charts, therefore is labor intensive and time consuming. For some disease conditions, it is prohibitive to identify the gold-standard controls because routine clinical assessments are performed for selective patients who are deemed to possibly have the condition. To build a model for phenotyping patients in EHRs, the most readily accessible data are often for a cohort consisting of a set of gold-standard cases and a large number of unlabeled patients. Hereby, we propose methods for assessing model calibration and discrimination using such "positive-only" EHR data that does not require gold-standard controls, provided that the labeled cases are representative of all cases. For model calibration, we propose a novel statistic that aggregates differences between model-free and model-based estimated numbers of cases across risk subgroups, which asymptotically follows a Chi-squared distribution. We additionally demonstrate that the calibration slope can also be estimated using such "positive-only" data. We propose consistent estimators for discrimination measures and derive their large sample properties. We demonstrate performances of the proposed methods through extensive simulation studies and apply them to Penn Medicine EHRs to validate twoSummary: Validation of phenotyping models using Electronic Health Records (EHRs) data conventionally requires gold-standard case and control labels. The labeling process requires clinical experts to retrospectively review patients' medical charts, therefore is labor intensive and time consuming. For some disease conditions, it is prohibitive to identify the gold-standard controls because routine clinical assessments are performed for selective patients who are deemed to possibly have the condition. To build a model for phenotyping patients in EHRs, the most readily accessible data are often for a cohort consisting of a set of gold-standard cases and a large number of unlabeled patients. Hereby, we propose methods for assessing model calibration and discrimination using such "positive-only" EHR data that does not require gold-standard controls, provided that the labeled cases are representative of all cases. For model calibration, we propose a novel statistic that aggregates differences between model-free and model-based estimated numbers of cases across risk subgroups, which asymptotically follows a Chi-squared distribution. We additionally demonstrate that the calibration slope can also be estimated using such "positive-only" data. We propose consistent estimators for discrimination measures and derive their large sample properties. We demonstrate performances of the proposed methods through extensive simulation studies and apply them to Penn Medicine EHRs to validate two preliminary models for predicting the risk of primary aldosteronism. … (more)
- Is Part Of:
- Biostatistics. Volume 23:Number 3(2022)
- Journal:
- Biostatistics
- Issue:
- Volume 23:Number 3(2022)
- Issue Display:
- Volume 23, Issue 3 (2022)
- Year:
- 2022
- Volume:
- 23
- Issue:
- 3
- Issue Sort Value:
- 2022-0023-0003-0000
- Page Start:
- 844
- Page End:
- 859
- Publication Date:
- 2021-02-22
- Subjects:
- Calibration -- Discrimination -- Electronic health records -- Positive-only data -- Phenotyping
Medical statistics -- Periodicals
Biometry -- Periodicals
Health risk assessment -- Periodicals
Medicine -- Research -- Statistical methods -- Periodicals
610.727 - Journal URLs:
- http://www3.oup.co.uk/biosts ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/biostatistics/kxab003 ↗
- Languages:
- English
- ISSNs:
- 1465-4644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2089.628000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22541.xml