Validating the extract, transform, load process used to populate a large clinical research database. (October 2016)
- Record Type:
- Journal Article
- Title:
- Validating the extract, transform, load process used to populate a large clinical research database. (October 2016)
- Main Title:
- Validating the extract, transform, load process used to populate a large clinical research database
- Authors:
- Denney, Michael J.
Long, Dustin M.
Armistead, Matthew G.
Anderson, Jamie L.
Conway, Baqiyyah N. - Abstract:
- Abstract: Background: Informaticians at any institution that are developing clinical research support infrastructure are tasked with populating research databases with data extracted and transformed from their institution's operational databases, such as electronic health records (EHRs). These data must be properly extracted from these source systems, transformed into a standard data structure, and then loaded into the data warehouse while maintaining the integrity of these data. We validated the correctness of the extract, load, and transform (ETL) process of the extracted data of West Virginia Clinical and Translational Science Institute's Integrated Data Repository, a clinical data warehouse that includes data extracted from two EHR systems. Methods: Four hundred ninety-eight observations were randomly selected from the integrated data repository and compared with the two source EHR systems. Results: Of the 498 observations, there were 479 concordant and 19 discordant observations. The discordant observations fell into three general categories: a) design decision differences between the IDR and source EHRs, b) timing differences, and c) user interface settings. After resolving apparent discordances, our integrated data repository was found to be 100% accurate relative to its source EHR systems. Conclusion: Any institution that uses a clinical data warehouse that is developed based on extraction processes from operational databases, such as EHRs, employs some form of anAbstract: Background: Informaticians at any institution that are developing clinical research support infrastructure are tasked with populating research databases with data extracted and transformed from their institution's operational databases, such as electronic health records (EHRs). These data must be properly extracted from these source systems, transformed into a standard data structure, and then loaded into the data warehouse while maintaining the integrity of these data. We validated the correctness of the extract, load, and transform (ETL) process of the extracted data of West Virginia Clinical and Translational Science Institute's Integrated Data Repository, a clinical data warehouse that includes data extracted from two EHR systems. Methods: Four hundred ninety-eight observations were randomly selected from the integrated data repository and compared with the two source EHR systems. Results: Of the 498 observations, there were 479 concordant and 19 discordant observations. The discordant observations fell into three general categories: a) design decision differences between the IDR and source EHRs, b) timing differences, and c) user interface settings. After resolving apparent discordances, our integrated data repository was found to be 100% accurate relative to its source EHR systems. Conclusion: Any institution that uses a clinical data warehouse that is developed based on extraction processes from operational databases, such as EHRs, employs some form of an ETL process. As secondary use of EHR data begins to transform the research landscape, the importance of the basic validation of the extracted EHR data cannot be underestimated and should start with the validation of the extraction process itself. … (more)
- Is Part Of:
- International journal of medical informatics. Volume 94(2016)
- Journal:
- International journal of medical informatics
- Issue:
- Volume 94(2016)
- Issue Display:
- Volume 94, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 94
- Issue:
- 2016
- Issue Sort Value:
- 2016-0094-2016-0000
- Page Start:
- 271
- Page End:
- 274
- Publication Date:
- 2016-10
- Subjects:
- Correctness -- Clinical data warehouse -- Electronic health record -- Extract transform load -- Informatics
Medical informatics -- Periodicals
Information science -- Periodicals
Computers -- Periodicals
Medical technology -- Periodicals
Medical Informatics -- Periodicals
Technology, Medical -- Periodicals
Computers
Information science
Medical informatics
Medical technology
Electronic journals
Periodicals
Electronic journals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/13865056 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/13865056 ↗
http://www.clinicalkey.com.au/dura/browse/journalIssue/13865056 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ijmedinf.2016.07.009 ↗
- Languages:
- English
- ISSNs:
- 1386-5056
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.345250
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 37.xml