Towards optimized scheduling for data‐intensive scientific workflow in multiple datacenter environment. (27th July 2015)
- Record Type:
- Journal Article
- Title:
- Towards optimized scheduling for data‐intensive scientific workflow in multiple datacenter environment. (27th July 2015)
- Main Title:
- Towards optimized scheduling for data‐intensive scientific workflow in multiple datacenter environment
- Authors:
- Zhang, Jinghui
Wang, Mingjun
Luo, Junzhou
Dong, Fang
Zhang, Junxue - Abstract:
- Summary: In the big data era, scientific workflow exhibits the characteristics of data intensity and becomes increasingly popular in scientific domains. Efficient scheduling of data‐intensive scientific workflow in a multiple datacenter (DC) environment has been a long‐standing challenge. Most of previous work on data‐intensive scientific workflow scheduling primarily focused on the optimization of reducing the volumes of data transfer between workflow tasks. In this paper, novel scheduling strategies for the execution of data‐intensive scientific workflow in multi‐DC environment are proposed aiming at the optimization of the overall data transfer time. A novel DC selection approach is proposed to minimize the number of DCs having enough storage capacity for the execution of scientific workflow as well as optimized inter‐DC network bandwidth for efficient data transfer between workflow tasks. A k‐means clustering‐based data placement strategy is adopted to intelligently place the initial data of scientific workflow thereby reducing the volume of initial data transfer between different DCs. A multilevel task replication scheduling strategy is invented to reduce the volumes of intermediate data transfer between DCs during the runtime of the scientific workflow. Simulations spanning a broad range of scientific workflow and multi‐DC settings are performed in order to verify the proposed approaches. The numerical results show that our combined scheduling strategy significantlySummary: In the big data era, scientific workflow exhibits the characteristics of data intensity and becomes increasingly popular in scientific domains. Efficient scheduling of data‐intensive scientific workflow in a multiple datacenter (DC) environment has been a long‐standing challenge. Most of previous work on data‐intensive scientific workflow scheduling primarily focused on the optimization of reducing the volumes of data transfer between workflow tasks. In this paper, novel scheduling strategies for the execution of data‐intensive scientific workflow in multi‐DC environment are proposed aiming at the optimization of the overall data transfer time. A novel DC selection approach is proposed to minimize the number of DCs having enough storage capacity for the execution of scientific workflow as well as optimized inter‐DC network bandwidth for efficient data transfer between workflow tasks. A k‐means clustering‐based data placement strategy is adopted to intelligently place the initial data of scientific workflow thereby reducing the volume of initial data transfer between different DCs. A multilevel task replication scheduling strategy is invented to reduce the volumes of intermediate data transfer between DCs during the runtime of the scientific workflow. Simulations spanning a broad range of scientific workflow and multi‐DC settings are performed in order to verify the proposed approaches. The numerical results show that our combined scheduling strategy significantly reduces the overall data transfer time and data transfer volume when scientific workflow is scheduled in multi‐DC environment. Copyright © 2015 John Wiley & Sons, Ltd. … (more)
- Is Part Of:
- Concurrency and computation. Volume 27:Number 18(2015:Dec.)
- Journal:
- Concurrency and computation
- Issue:
- Volume 27:Number 18(2015:Dec.)
- Issue Display:
- Volume 27, Issue 18 (2015)
- Year:
- 2015
- Volume:
- 27
- Issue:
- 18
- Issue Sort Value:
- 2015-0027-0018-0000
- Page Start:
- 5606
- Page End:
- 5622
- Publication Date:
- 2015-07-27
- Subjects:
- scientific workflow -- scheduling -- multiple datacenter
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3601 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 784.xml