An automated data verification approach for improving data quality in a clinical registry. (November 2019)
- Record Type:
- Journal Article
- Title:
- An automated data verification approach for improving data quality in a clinical registry. (November 2019)
- Main Title:
- An automated data verification approach for improving data quality in a clinical registry
- Authors:
- Tian, Qi
Liu, Mengzhou
Min, Lingtong
An, Jiye
Lu, Xudong
Duan, Huilong - Abstract:
- Highlights: Proposed and implemented an automated data verification approach for registry data quality assessment and improvement. Paper-based documents and electronic medical records are used to verify registry data automatically. Machine learning enhanced optical character recognition is used to recognize paper-based documents more accurate. The automated approach is more accurate and efficient to identify incomplete data and incorrect data of registry study than the traditional manual approach. Abstract: Background and Objective: The quality of data is crucial for clinical registry studies as it impacts credibility. In the regular practice of most such studies, a vulnerability arises from researchers recording data on paper-based case report forms (CRFs) and further transcribing them onto registry databases. To ensure the quality of data, verifying data in the registry is necessary. However, traditional manual data verification methods are time-consuming, labor-intensive and of limited-effect. As paper-based CRFs and electronic medical records (EMRs) are two sources for verification, we propose an automated data verification approach based on the techniques of optical character recognition (OCR) and information retrieval to identify data errors in a registry more efficiently. Methods: Three steps are involved to develop the automated verification approach. First, we analyze the scanned images of paper-based CRFs with machine learning enhanced OCR to recognize the checkboxHighlights: Proposed and implemented an automated data verification approach for registry data quality assessment and improvement. Paper-based documents and electronic medical records are used to verify registry data automatically. Machine learning enhanced optical character recognition is used to recognize paper-based documents more accurate. The automated approach is more accurate and efficient to identify incomplete data and incorrect data of registry study than the traditional manual approach. Abstract: Background and Objective: The quality of data is crucial for clinical registry studies as it impacts credibility. In the regular practice of most such studies, a vulnerability arises from researchers recording data on paper-based case report forms (CRFs) and further transcribing them onto registry databases. To ensure the quality of data, verifying data in the registry is necessary. However, traditional manual data verification methods are time-consuming, labor-intensive and of limited-effect. As paper-based CRFs and electronic medical records (EMRs) are two sources for verification, we propose an automated data verification approach based on the techniques of optical character recognition (OCR) and information retrieval to identify data errors in a registry more efficiently. Methods: Three steps are involved to develop the automated verification approach. First, we analyze the scanned images of paper-based CRFs with machine learning enhanced OCR to recognize the checkbox marks and hand-writing. Then, we retrieve the related patient information from the EMRs using natural language processing (NLP) techniques. Finally, we compare the retrieved information in the previous two steps with the data in the registry, and synthesize the results accordingly. The proposed automated method has been applied in a Chinese registry study and the difference between automated and manual approach has been evaluated. Results: The automated approach has been implemented in The Chinese Coronary Artery Disease Registry. For CRF data recognition, the accuracy of recognition for checkboxes marks and hand-writing are 0.93 and 0.74, respectively. For EMR data extraction, the accuracy of information retrieval from textual electronic medical records is 0.97. The accuracy, recall and time consumption of the automated approach are 0.93, 0.96 and 0.5 h, better than the corresponding values of the manual approach, which are 0.92, 0.71 and 7.5 h. Conclusions: Compared to the manual data verification approach, the automated approach enhances the recall of identify data errors and has a higher accuracy. The time consumed is far less. The results show that the automated approach is more effective and efficient for identifying incomplete data and incorrect data in a registry. The proposed approach has potential to improve the quality of registry data. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 181(2020)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 181(2020)
- Issue Display:
- Volume 181, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 181
- Issue:
- 2020
- Issue Sort Value:
- 2020-0181-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-11
- Subjects:
- Data quality -- Clinical registry -- Automated data verification -- Data quality improvement
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2019.01.012 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12168.xml