A new approach to configurable primary data collection. Issue 133 (September 2016)
- Record Type:
- Journal Article
- Title:
- A new approach to configurable primary data collection. Issue 133 (September 2016)
- Main Title:
- A new approach to configurable primary data collection
- Authors:
- Stanek, J.
Babkin, E.
Zubov, M. - Abstract:
- Highlights: We present a web-based user-configurable primary data collection tool. To collect data from non-cooperative sources, control flow is recovered from data flow. Data are transformed, sanitized and safely uploaded to the target system. Data transformation/sanitization at the source reduces confidentiality issues. The tool offers the option of manual entry of data absent in the source data sets. Abstract: Background and objectives: The formats, semantics and operational rules of data processing tasks in genomics (and health in general) are highly divergent and can rapidly change. In such an environment, the problem of consistent transformation and loading of heterogeneous input data to various target repositories becomes a critical success factor. The objective of the project was to design a new conceptual approach to configurable data transformation, de-identification, and submission of health and genomic data sets. Main motivation was to facilitate automated or human-driven data uploading, as well as consolidation of heterogeneous sources in large genomic or health projects. Methods: Modern methods of on-demand specialization of generic software components were applied. For specification of input–output data and required data collection activities, we propose a simple data model of flat tables as well as a domain-oriented graphical interface and portable representation of transformations in XML. Using such methods, the prototype of the Configurable Data CollectionHighlights: We present a web-based user-configurable primary data collection tool. To collect data from non-cooperative sources, control flow is recovered from data flow. Data are transformed, sanitized and safely uploaded to the target system. Data transformation/sanitization at the source reduces confidentiality issues. The tool offers the option of manual entry of data absent in the source data sets. Abstract: Background and objectives: The formats, semantics and operational rules of data processing tasks in genomics (and health in general) are highly divergent and can rapidly change. In such an environment, the problem of consistent transformation and loading of heterogeneous input data to various target repositories becomes a critical success factor. The objective of the project was to design a new conceptual approach to configurable data transformation, de-identification, and submission of health and genomic data sets. Main motivation was to facilitate automated or human-driven data uploading, as well as consolidation of heterogeneous sources in large genomic or health projects. Methods: Modern methods of on-demand specialization of generic software components were applied. For specification of input–output data and required data collection activities, we propose a simple data model of flat tables as well as a domain-oriented graphical interface and portable representation of transformations in XML. Using such methods, the prototype of the Configurable Data Collection System (CDCS) was implemented in Java programming language with Swing graphical interfaces. The core logic of transformations was implemented as a library of reusable plugins. Results: The solution is implemented as a software prototype for a configurable service-oriented system for semi-automatic data collection, transformation, sanitization and safe uploading to heterogeneous data repositories—CDCS. To address the dynamic nature of data schemas and data collection processes, the CDCS prototype facilitates interactive, user-driven configuration of the data collection process and extends basic functionality with a wide range of third-party plugins. Notably, our solution also allows for the reduction of manual data entry for data originally missing in the output data sets. Conclusions: First experiments and feedback from domain experts confirm the prototype is flexible, configurable and extensible; runs well on data owner's systems; and is not dependent on vendor's standards. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Issue 133(2016)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Issue 133(2016)
- Issue Display:
- Volume 133, Issue 133 (2016)
- Year:
- 2016
- Volume:
- 133
- Issue:
- 133
- Issue Sort Value:
- 2016-0133-0133-0000
- Page Start:
- 169
- Page End:
- 181
- Publication Date:
- 2016-09
- Subjects:
- Genomic data -- Variety -- Extraction–transformation–loading -- Customization -- Manual entry optimization
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2016.05.007 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8057.xml