A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications. (20th September 2021)
- Record Type:
- Journal Article
- Title:
- A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications. (20th September 2021)
- Main Title:
- A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications
- Authors:
- Ataie, Ehsan
Evangelinou, Athanasia
Gianniti, Eugenio
Ardagna, Danilo - Abstract:
- Abstract: Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system areAbstract: Nowadays, Apache Hadoop and Apache Spark are two of the most prominent distributed solutions for processing big data applications on the market. Since in many cases these frameworks are adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted applications, for instance when service-level agreements are established with end-users. In this work, we propose and validate a hybrid approach for the performance prediction of big data applications running on clouds, which exploits both analytical modeling and machine learning (ML) techniques and it is able to achieve a good accuracy without too many time consuming and costly experiments on a real setup. The experimental results show how the proposed approach attains improvement in accuracy, number of experiments to be run on the operational system and cost over applying ML techniques without any support from analytical models. Moreover, we compare our approach with Ernest, an ML-based technique proposed in the literature by the Spark inventors. Experiments show that Ernest can accurately estimate the performance in interpolating scenarios while it fails to predict the performance when configurations with increasing number of cores are considered. Finally, a comparison with a similar hybrid approach proposed in the literature demonstrates how our approach significantly reduce prediction errors especially when few experiments on the real system are performed. … (more)
- Is Part Of:
- Computer journal. Volume 65:Number 12(2022)
- Journal:
- Computer journal
- Issue:
- Volume 65:Number 12(2022)
- Issue Display:
- Volume 65, Issue 12 (2022)
- Year:
- 2022
- Volume:
- 65
- Issue:
- 12
- Issue Sort Value:
- 2022-0065-0012-0000
- Page Start:
- 3123
- Page End:
- 3140
- Publication Date:
- 2021-09-20
- Subjects:
- analytical performance modeling -- machine learning -- cloud computing -- MapReduce -- Hadoop -- Tez -- Spark
Computers -- Periodicals
005.1 - Journal URLs:
- http://comjnl.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/comjnl/bxab131 ↗
- Languages:
- English
- ISSNs:
- 0010-4620
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.060000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24860.xml