A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data. (November 2021)
- Record Type:
- Journal Article
- Title:
- A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data. (November 2021)
- Main Title:
- A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
- Authors:
- Khalid, Sara
Yang, Cynthia
Blacketer, Clair
Duarte-Salles, Talita
Fernández-Bertolín, Sergio
Kim, Chungsoo
Park, Rae Woong
Park, Jimyung
Schuemie, Martijn J.
Sena, Anthony G.
Suchard, Marc A.
You, Seng Chan
Rijnbeek, Peter R.
Reps, Jenna M. - Abstract:
- Highlights: Harmonization and quality control of originally heterogenous observational databases. Large-scale application of machine learning methods in a distributed data network. Transparent use of open-source software tools and publicly shared analytical code. Abstract: Background and objective: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias ( e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). Methods: We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. WeHighlights: Harmonization and quality control of originally heterogenous observational databases. Large-scale application of machine learning methods in a distributed data network. Transparent use of open-source software tools and publicly shared analytical code. Abstract: Background and objective: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias ( e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). Methods: We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20, 000 COVID-19 hospitalizations and externally validate the models using data containing over 45, 000 COVID-19 hospitalizations from South Korea, Spain, and the USA. Results: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. Conclusion: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 211(2021)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 211(2021)
- Issue Display:
- Volume 211, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 211
- Issue:
- 2021
- Issue Sort Value:
- 2021-0211-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-11
- Subjects:
- COVID-19 -- Data harmonization -- Data quality control -- Distributed data network -- Machine learning -- Risk prediction
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2021.106394 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25519.xml