A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records. (1st March 2023)
- Record Type:
- Journal Article
- Title:
- A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records. (1st March 2023)
- Main Title:
- A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records
- Authors:
- Kogan, Emily
Didden, Eva-Maria
Lee, Eileen
Nnewihe, Anderson
Stamatiadis, Dimitri
Mataraso, Samson
Quinn, Deborah
Rosenberg, Daniel
Chehoud, Christel
Bridges, Charles - Abstract:
- Abstract: Background: This study aimed to develop a machine learning (ML) model to identify patients who are likely to have pulmonary hypertension (PH), using a large patient-level US-based electronic health record (EHR) database. Methods: A gradient boosting model, XGBoost, was developed using data from Optum's US-based de-identified EHR dataset (2007–2019). PH and disease control adult patients were identified using diagnostic, treatment and procedure codes and were randomly split into the training (90%) or test set (10%). Model features included patient demographics, physician visits, diagnoses, procedures, prescriptions, and laboratory test results. SHapley Additive exPlanations values were used to determine feature importance. Results: We identified 11, 279, 478 control and 115, 822 PH patients (mean age, respectively: 62 and 68 years, both 53% female). The final model used 165 features, with the most important predictive features including diagnosis of heart failure, shortness of breath and atrial fibrillation. The model predicted PH with an area under the receiver operating characteristic curve (AUROC) of 0.92. AUROC remained above 0.80 for the prediction of PH up to and beyond 18 months before diagnosis. Among the PH patients, we also identified 955 pulmonary arterial hypertension (PAH) and 1432 chronic thromboembolic pulmonary hypertension (CTEPH) patients, and the range of AUROCs obtained for these cohorts was 0.79–0.90 and 0.87–0.96, respectively. Conclusions:Abstract: Background: This study aimed to develop a machine learning (ML) model to identify patients who are likely to have pulmonary hypertension (PH), using a large patient-level US-based electronic health record (EHR) database. Methods: A gradient boosting model, XGBoost, was developed using data from Optum's US-based de-identified EHR dataset (2007–2019). PH and disease control adult patients were identified using diagnostic, treatment and procedure codes and were randomly split into the training (90%) or test set (10%). Model features included patient demographics, physician visits, diagnoses, procedures, prescriptions, and laboratory test results. SHapley Additive exPlanations values were used to determine feature importance. Results: We identified 11, 279, 478 control and 115, 822 PH patients (mean age, respectively: 62 and 68 years, both 53% female). The final model used 165 features, with the most important predictive features including diagnosis of heart failure, shortness of breath and atrial fibrillation. The model predicted PH with an area under the receiver operating characteristic curve (AUROC) of 0.92. AUROC remained above 0.80 for the prediction of PH up to and beyond 18 months before diagnosis. Among the PH patients, we also identified 955 pulmonary arterial hypertension (PAH) and 1432 chronic thromboembolic pulmonary hypertension (CTEPH) patients, and the range of AUROCs obtained for these cohorts was 0.79–0.90 and 0.87–0.96, respectively. Conclusions: This model to detect PH based on patients' EHR records is viable and performs well in subgroups of PAH and CTEPH patients. This approach has the potential to improve patient outcomes by reducing diagnostic delay in PH. Highlights: We built a machine learning model using electronic health records to detect PH. Our model retrospectively predicted PH up to 18 months before clinical diagnosis. Our model performed well in detection of two treatable PH subgroups. This approach could reduce diagnostic delay in PH and improve patient outcomes. … (more)
- Is Part Of:
- International journal of cardiology. Volume 374(2023)
- Journal:
- International journal of cardiology
- Issue:
- Volume 374(2023)
- Issue Display:
- Volume 374, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 374
- Issue:
- 2023
- Issue Sort Value:
- 2023-0374-2023-0000
- Page Start:
- 95
- Page End:
- 99
- Publication Date:
- 2023-03-01
- Subjects:
- Artificial intelligence -- Machine learning -- Pulmonary hypertension -- Diagnostic delay -- Early diagnosis -- Electronic health record
Cardiology -- Periodicals
Electronic journals
616.12 - Journal URLs:
- http://www.clinicalkey.com/dura/browse/journalIssue/01675273 ↗
http://www.sciencedirect.com/science/journal/01675273 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ijcard.2022.12.016 ↗
- Languages:
- English
- ISSNs:
- 0167-5273
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.158000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25677.xml