Towards Automatic Data Format Transformations: Data Wrangling at Scale. (1st December 2018)
- Record Type:
- Journal Article
- Title:
- Towards Automatic Data Format Transformations: Data Wrangling at Scale. (1st December 2018)
- Main Title:
- Towards Automatic Data Format Transformations: Data Wrangling at Scale
- Authors:
- Bogatu, Alex
Paton, Norman W
Fernandes, Alvaro A A
Koehler, Martin - Editors:
- Wood, Peter
- Abstract:
- Abstract: Data wrangling is the process whereby data are cleaned and integrated for analysis. Data wrangling, even with tool support, is typically a labour intensive process. One aspect of data wrangling involves carrying out format transformations on attribute values, for example so that names or phone numbers are represented consistently. Recent research has developed techniques for synthesizing format transformation programs from examples of the source and target representations. This is valuable, but still requires a user to provide suitable examples, something that may be challenging in applications in which there are huge datasets or numerous data sources. In this paper, we investigate the automatic discovery of examples that can be used to synthesize format transformation programs. In particular, we propose two approaches to identifying candidate data examples and validating the transformations that are synthesized from them. The approaches are evaluated empirically using datasets from open government data.
- Is Part Of:
- Computer journal. Volume 62:Number 7(2019)
- Journal:
- Computer journal
- Issue:
- Volume 62:Number 7(2019)
- Issue Display:
- Volume 62, Issue 7 (2019)
- Year:
- 2019
- Volume:
- 62
- Issue:
- 7
- Issue Sort Value:
- 2019-0062-0007-0000
- Page Start:
- 1044
- Page End:
- 1060
- Publication Date:
- 2018-12-01
- Subjects:
- format transformations -- data wrangling -- program synthesis
Computers -- Periodicals
005.1 - Journal URLs:
- http://comjnl.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/comjnl/bxy118 ↗
- Languages:
- English
- ISSNs:
- 0010-4620
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.060000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25154.xml