Fairness in dataflow scheduling in the cloud. (July 2019)
- Record Type:
- Journal Article
- Title:
- Fairness in dataflow scheduling in the cloud. (July 2019)
- Main Title:
- Fairness in dataflow scheduling in the cloud
- Authors:
- Pietri, Ilia
Chronis, Yannis
Ioannidis, Yannis - Abstract:
- Abstract: Expensive dataflow queries which may involve large-scale computations operating on significant volumes of data are typically executed on distributed platforms to improve application performance. Among these, cloud computing has emerged as an attractive option for users to execute dataflows allowing them to select proper configurations (e.g., number of machines) to achieve desired trade-offs between execution time and monetary cost. Discovering dataflow schedules that exhibit the best trade-offs within a plethora of potential solutions can be challenging, especially in a heterogeneous environment where resource characteristics like performance and price can be varied. To increase resource utilization, users may also submit multiple dataflows for execution concurrently. Traditionally, building fair schedules (schedules where the slowdown of all dataflows due to resource sharing is similar) while achieving good performance is a major concern. However, considering fairness in the cloud computing setting where monetary cost is part of the optimization objectives significantly increases the difficulty of the scheduling problem. This paper proposes an algorithm for the scheduling of multiple dataflows on heterogeneous clouds that identifies Pareto-optimal solutions (schedules) in the three-dimensional space formed from the different trade-offs between overall execution time, monetary cost and fairness. The results show that in most cases the proposed approach can provideAbstract: Expensive dataflow queries which may involve large-scale computations operating on significant volumes of data are typically executed on distributed platforms to improve application performance. Among these, cloud computing has emerged as an attractive option for users to execute dataflows allowing them to select proper configurations (e.g., number of machines) to achieve desired trade-offs between execution time and monetary cost. Discovering dataflow schedules that exhibit the best trade-offs within a plethora of potential solutions can be challenging, especially in a heterogeneous environment where resource characteristics like performance and price can be varied. To increase resource utilization, users may also submit multiple dataflows for execution concurrently. Traditionally, building fair schedules (schedules where the slowdown of all dataflows due to resource sharing is similar) while achieving good performance is a major concern. However, considering fairness in the cloud computing setting where monetary cost is part of the optimization objectives significantly increases the difficulty of the scheduling problem. This paper proposes an algorithm for the scheduling of multiple dataflows on heterogeneous clouds that identifies Pareto-optimal solutions (schedules) in the three-dimensional space formed from the different trade-offs between overall execution time, monetary cost and fairness. The results show that in most cases the proposed approach can provide solutions with fairer schedules without significantly impacting the quality of the execution time to monetary cost skyline compared to the state of the art where the fairness of a solution is not taken into account. Highlights: Fairness for the scheduling of multiple dataflows on the Cloud where cost is crucial. Heuristic for Pareto-efficient solutions with respect to makespan, cost and fairness. Impact of the prioritization scheme and the pruning method used on the skyline. … (more)
- Is Part Of:
- Information systems. Volume 83(2019)
- Journal:
- Information systems
- Issue:
- Volume 83(2019)
- Issue Display:
- Volume 83, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 83
- Issue:
- 2019
- Issue Sort Value:
- 2019-0083-2019-0000
- Page Start:
- 118
- Page End:
- 125
- Publication Date:
- 2019-07
- Subjects:
- Cloud computing -- Multiple dataflows -- Fairness -- Dataflow scheduling
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2019.03.003 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10123.xml