Understanding and detecting defects in healthcare administration data: Toward higher data quality to better support healthcare operations and decisions. (16th December 2019)
- Record Type:
- Journal Article
- Title:
- Understanding and detecting defects in healthcare administration data: Toward higher data quality to better support healthcare operations and decisions. (16th December 2019)
- Main Title:
- Understanding and detecting defects in healthcare administration data: Toward higher data quality to better support healthcare operations and decisions
- Authors:
- Zhang, Yili
Koru, Güneş - Abstract:
- Abstract: Objective: Development of systematic approaches for understanding and assessing data quality is becoming increasingly important as the volume and utilization of health data steadily increases. In this study, a taxonomy of data defects was developed and utilized when automatically detecting defects to assess Medicaid data quality maintained by one of the states in the United States. Materials and Methods: There were more than 2.23 million rows and 32 million cells in the Medicaid data examined. The taxonomy was developed through document review, descriptive data analysis, and literature review. A software program was created to automatically detect defects by using a set of constraints whose development was facilitated by the taxonomy. Results: Five major categories and seventeen subcategories of defects were identified. The major categories are missingness, incorrectness, syntax violation, semantic violation, and duplicity. More than 3 million defects were detected indicating substantial problems with data quality. Defect density exceeded 10% in five tables. The majority of the data defects belonged to format mismatch, invalid code, dependency-contract violation, and implausible value types. Such contextual knowledge can support prioritized quality improvement initiatives for the Medicaid data studied. Conclusions: This research took the initial steps to understand the types of data defects and detect defects in large healthcare datasets. The results generallyAbstract: Objective: Development of systematic approaches for understanding and assessing data quality is becoming increasingly important as the volume and utilization of health data steadily increases. In this study, a taxonomy of data defects was developed and utilized when automatically detecting defects to assess Medicaid data quality maintained by one of the states in the United States. Materials and Methods: There were more than 2.23 million rows and 32 million cells in the Medicaid data examined. The taxonomy was developed through document review, descriptive data analysis, and literature review. A software program was created to automatically detect defects by using a set of constraints whose development was facilitated by the taxonomy. Results: Five major categories and seventeen subcategories of defects were identified. The major categories are missingness, incorrectness, syntax violation, semantic violation, and duplicity. More than 3 million defects were detected indicating substantial problems with data quality. Defect density exceeded 10% in five tables. The majority of the data defects belonged to format mismatch, invalid code, dependency-contract violation, and implausible value types. Such contextual knowledge can support prioritized quality improvement initiatives for the Medicaid data studied. Conclusions: This research took the initial steps to understand the types of data defects and detect defects in large healthcare datasets. The results generally suggest that healthcare organizations can potentially benefit from focusing on data quality improvement. For those purposes, the taxonomy developed and the approach followed in this study can be adopted. … (more)
- Is Part Of:
- Journal of the American Medical Informatics Association. Volume 27:Number 3(2020)
- Journal:
- Journal of the American Medical Informatics Association
- Issue:
- Volume 27:Number 3(2020)
- Issue Display:
- Volume 27, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 27
- Issue:
- 3
- Issue Sort Value:
- 2020-0027-0003-0000
- Page Start:
- 386
- Page End:
- 395
- Publication Date:
- 2019-12-16
- Subjects:
- data quality -- data defect -- defect taxonomy -- healthcare administration -- Medicaid management information system
Medical informatics -- Periodicals
Information Services -- Periodicals
Medical Informatics -- Periodicals
Médecine -- Informatique -- Périodiques
Informatica
Geneeskunde
Informatique médicale
Computer network resources
Electronic journals
610.285 - Journal URLs:
- http://jamia.bmj.com/ ↗
http://www.jamia.org ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=76 ↗
http://www.sciencedirect.com/science/journal/10675027 ↗
http://jamia.oxfordjournals.org/ ↗
http://www.oxfordjournals.org/en/ ↗ - DOI:
- 10.1093/jamia/ocz201 ↗
- Languages:
- English
- ISSNs:
- 1067-5027
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4689.025000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15139.xml