A semantic‐aware data generator for ETL workflows. (22nd April 2013)
- Record Type:
- Journal Article
- Title:
- A semantic‐aware data generator for ETL workflows. (22nd April 2013)
- Main Title:
- A semantic‐aware data generator for ETL workflows
- Authors:
- Du, Naiqiao
Ye, Xiaojun
Wang, Jianmin - Other Names:
- Simmhan Yogesh guestEditor.
Ramakrishnan Lavanya guestEditor.
Antoniu Gabriel guestEditor.
Goble Carole guestEditor.
Yu Yong guestEditor.
Mu Yi guestEditor.
Lu Rongxing guestEditor.
Ren Jian guestEditor.
Venticinque Salvatore guestEditor.
Camacho David guestEditor. - Abstract:
- Summary: Extract, transform, and load (ETL) processes organized as workflows play an important role in the future data integration for cloud services. ETL designers/administrators need testing data set that is aware of semantics of ETL workflow workloads to evaluate their developed ETL systems. Populating testing ETL systems with meaningful workload data is a difficult task. In this paper, we propose a semantic‐aware data generator for ETL workflows. With given ETL workflow models and workload characterizations, the generator is able to generate synthetic data that capture the semantics of ETL activities. This is carried out by a three‐staged approach. First, we derive expected cardinalities of all the source, intermediate, and target data sets involved in the ETL workflow model with some user‐specified cardinality requirements. Then, with the concept of symbolic test, symbolic data instead of concrete data involved in ETL activities are generated, and semantics of the ETL workflow models are transformed to various constraints over these symbols. At last, concrete data are derived on the basis of resolving constraints. Our generator may facilitate ETL workload test case generation for ETL toolkit performance and function evaluations as well as ETL workflow solution benchmarking. Copyright © 2013 John Wiley & Sons, Ltd.
- Is Part Of:
- Concurrency and computation. Volume 28:Number 4(2016)
- Journal:
- Concurrency and computation
- Issue:
- Volume 28:Number 4(2016)
- Issue Display:
- Volume 28, Issue 4 (2016)
- Year:
- 2016
- Volume:
- 28
- Issue:
- 4
- Issue Sort Value:
- 2016-0028-0004-0000
- Page Start:
- 1016
- Page End:
- 1040
- Publication Date:
- 2013-04-22
- Subjects:
- ETL workflow -- workload characterization -- symbolic test -- synthetic data
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3028 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 5.xml