Experimenting with big data computing for scaling data quality-aware query processing. (15th September 2021)
- Record Type:
- Journal Article
- Title:
- Experimenting with big data computing for scaling data quality-aware query processing. (15th September 2021)
- Main Title:
- Experimenting with big data computing for scaling data quality-aware query processing
- Authors:
- Cisneros-Cabrera, Sonia
Michailidou, Anna-Valentini
Sampaio, Sandra
Sampaio, Pedro
Gounaris, Anastasios - Abstract:
- Highlights: Empirical study aimed at "scaling up" data quality (DQ) management applications. Execution of data quality-aware queries over sensor-collected traffic data sets. Exploration of Apache Spark and Pandas library to speed-up query execution. Insights on choice of computational infrastructure to deploy DQ management tools. Abstract: Combining query processing techniques with data quality management approaches enables enforcement of quality constraints, such as timeliness, accuracy and completeness, as part of ad-hoc query specification and execution, improving the quality of query results. Despite the emergence of novel data quality processing tools, there is a dearth of studies assessing performance and scalability in the execution of data quality assessment tasks during query processing. This paper reports on an empirical study aiming to investigate the extent to which a big data computing framework (Spark) can offer significant gains in performance and scalability when executing data quality querying tasks over a range of computational platforms including a single commodity multi-core machine and a cluster-based platform for a wide range of workloads. Our results show that substantial performance and scalability gains can be obtained by using optimized data science libraries combined with the parallel and distributed capabilities of big data computing. We also provide guidelines on choosing the appropriate computational infrastructure for executing DQ-aware queries.
- Is Part Of:
- Expert systems with applications. Volume 178(2021)
- Journal:
- Expert systems with applications
- Issue:
- Volume 178(2021)
- Issue Display:
- Volume 178, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 178
- Issue:
- 2021
- Issue Sort Value:
- 2021-0178-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-09-15
- Subjects:
- Data quality-aware queries -- Big data computing -- Empirical evaluation
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2021.114858 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16876.xml