Towards a content agnostic computable knowledge repository for data quality assessment. (August 2019)
- Record Type:
- Journal Article
- Title:
- Towards a content agnostic computable knowledge repository for data quality assessment. (August 2019)
- Main Title:
- Towards a content agnostic computable knowledge repository for data quality assessment
- Authors:
- Rajan, Naresh Sundar
Gouripeddi, Ramkiran
Mo, Peter
Madsen, Randy K.
Facelli, Julio C. - Abstract:
- Hightlights: We identified research gaps in data quality literature towards automating DQA methods. In this process, we designed, developed and implemented a computable data quality knowledge repository for assessing quality and characterizing data in health data repositories. In this process, we leveraged service-oriented architecture towards a scalable, reproducible framework in disparate biomedical data sources. Abstract: Background and objective: In recent years, several data quality conceptual frameworks have been proposed across the Data Quality and Information Quality domains towards assessment of quality of data. These frameworks are diverse, varying from simple lists of concepts to complex ontological and taxonomical representations of data quality concepts. The goal of this study is to design, develop and implement a platform agnostic computable data quality knowledge repository for data quality assessments. Methods: We identified computable data quality concepts by performing a comprehensive literature review of articles indexed in three major bibliographic data sources. From this corpus, we extracted data quality concepts, their definitions, applicable measures, their computability and identified conceptual relationships. We used these relationships to design and develop a data quality meta-model and implemented it in a quality knowledge repository. Results: We identified three primitives for programmatically performing data quality assessments: data qualityHightlights: We identified research gaps in data quality literature towards automating DQA methods. In this process, we designed, developed and implemented a computable data quality knowledge repository for assessing quality and characterizing data in health data repositories. In this process, we leveraged service-oriented architecture towards a scalable, reproducible framework in disparate biomedical data sources. Abstract: Background and objective: In recent years, several data quality conceptual frameworks have been proposed across the Data Quality and Information Quality domains towards assessment of quality of data. These frameworks are diverse, varying from simple lists of concepts to complex ontological and taxonomical representations of data quality concepts. The goal of this study is to design, develop and implement a platform agnostic computable data quality knowledge repository for data quality assessments. Methods: We identified computable data quality concepts by performing a comprehensive literature review of articles indexed in three major bibliographic data sources. From this corpus, we extracted data quality concepts, their definitions, applicable measures, their computability and identified conceptual relationships. We used these relationships to design and develop a data quality meta-model and implemented it in a quality knowledge repository. Results: We identified three primitives for programmatically performing data quality assessments: data quality concept, its definition, its measure or rule for data quality assessment, and their associations. We modeled a computable data quality meta-data repository and extended this framework to adapt, store, retrieve and automate assessment of other existing data quality assessment models. Conclusion: We identified research gaps in data quality literature towards automating data quality assessments methods. In this process, we designed, developed and implemented a computable data quality knowledge repository for assessing quality and characterizing data in health data repositories. We leverage this knowledge repository in a service-oriented architecture to perform scalable and reproducible framework for data quality assessments in disparate biomedical data sources. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 177(2019)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 177(2019)
- Issue Display:
- Volume 177, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 177
- Issue:
- 2019
- Issue Sort Value:
- 2019-0177-2019-0000
- Page Start:
- 193
- Page End:
- 201
- Publication Date:
- 2019-08
- Subjects:
- Data Quality Metadata Repository -- Knowledge representation -- Data quality assessment -- Data quality dimensions -- Data quality framework
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2019.05.017 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11049.xml