A two steps method of resources utilization predication for large Hadoop data center. (8th January 2020)
- Record Type:
- Journal Article
- Title:
- A two steps method of resources utilization predication for large Hadoop data center. (8th January 2020)
- Main Title:
- A two steps method of resources utilization predication for large Hadoop data center
- Authors:
- Yu, Lei
Teng, Fei
Ning, Shangming
Li, Yunshu
Cui, Zhe
Du, Shengdong - Other Names:
- Drira Khalil guestEditor.
Jmaiel Mohamed guestEditor.
Lastovetsky Alexey L. guestEditor.
Manumachu Ravi Reddy guestEditor. - Abstract:
- Summary: With the increase of data processing and Hadoop data center construction requirements, the performance of Hadoop data center is limited by inappropriate resources utilization. This paper introduces a new method to predict utilization for large‐scale Hadoop clusters. The new method adopts a two steps model, which includes Hadoop applications' performance simulation and resources utilization prediction. For performance simulation, a new simulator, which integrates baseline test and multilayered network model, is introduced and implemented. A resources utilization predictor is proposed in the second step. By analyzing the pattern of resources utilization, a single task model is proposed. A parallel‐batch‐task‐based (PBT) model, which represents the behavior of real Hadoop applications by integrating the single task model, is introduced. Two test scenarios are configured to verify the performance of our method. For the data center scenario, Terasort, Wordcount, and Hive are selected as benchmarks. In the virtual machines scenario, Terasort is used as benchmark. The experiments show that the error comparing between the simulator results and experimental environment results in most cases is less than 10%. The results confirm that we can locate the resource bottleneck for Hadoop clusters, meanwhile we can agilely configure clusters for applications with massive data.
- Is Part Of:
- Concurrency and computation. Volume 32:Number 15(2020)
- Journal:
- Concurrency and computation
- Issue:
- Volume 32:Number 15(2020)
- Issue Display:
- Volume 32, Issue 15 (2020)
- Year:
- 2020
- Volume:
- 32
- Issue:
- 15
- Issue Sort Value:
- 2020-0032-0015-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2020-01-08
- Subjects:
- baseline test -- data center -- Hadoop 2 -- resources utilization
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.5634 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13336.xml