PRBP: A prioritized replica balancing policy for HDFS balancer. (15th November 2022)
- Record Type:
- Journal Article
- Title:
- PRBP: A prioritized replica balancing policy for HDFS balancer. (15th November 2022)
- Main Title:
- PRBP: A prioritized replica balancing policy for HDFS balancer
- Authors:
- Fazul, Rhauani Weber Aita
Barcelos, Patrícia Pitthan - Abstract:
- Abstract: Data replication is the main fault tolerance mechanism implemented by the Apache Hadoop Distributed File System (HDFS). The placement of the data across the cluster directly affects replica balancing and data locality. The HDFS Balancer is the native solution to rebalance the data distribution by moving the blocks from over‐utilized to under‐utilized nodes. Nevertheless, its current balancing policy does not address the characteristics and specific needs of the applications during data rearrangement. In this work, we present the PRBP, a customized replica balancing policy for the HDFS Balancer. The PRBP is based on a system of priorities, which can be adapted and configured according to different demands of use, either these are related to heterogeneous environments or focused on improving data reliability and availability. The priorities define whether system metrics or aspects of the cluster topology should be considered during the execution of the HDFS Balancer, thus making the process of replica balancing in HDFS more flexible. Based on the priority system, we determine association rules that allow the use of multiple priorities simultaneously. Along with these rules, we present guidelines for using the PRBP as a specialized solution in scenarios that can benefit from reactive replica balancing. In addition, we conducted a practical experimentation to highlight the behavior and the applicability of the guidelines of the PRBP to prioritize replica rearrangementAbstract: Data replication is the main fault tolerance mechanism implemented by the Apache Hadoop Distributed File System (HDFS). The placement of the data across the cluster directly affects replica balancing and data locality. The HDFS Balancer is the native solution to rebalance the data distribution by moving the blocks from over‐utilized to under‐utilized nodes. Nevertheless, its current balancing policy does not address the characteristics and specific needs of the applications during data rearrangement. In this work, we present the PRBP, a customized replica balancing policy for the HDFS Balancer. The PRBP is based on a system of priorities, which can be adapted and configured according to different demands of use, either these are related to heterogeneous environments or focused on improving data reliability and availability. The priorities define whether system metrics or aspects of the cluster topology should be considered during the execution of the HDFS Balancer, thus making the process of replica balancing in HDFS more flexible. Based on the priority system, we determine association rules that allow the use of multiple priorities simultaneously. Along with these rules, we present guidelines for using the PRBP as a specialized solution in scenarios that can benefit from reactive replica balancing. In addition, we conducted a practical experimentation to highlight the behavior and the applicability of the guidelines of the PRBP to prioritize replica rearrangement in the file system. … (more)
- Is Part Of:
- Software, practice & experience. Volume 53:Number 3(2023)
- Journal:
- Software, practice & experience
- Issue:
- Volume 53:Number 3(2023)
- Issue Display:
- Volume 53, Issue 3 (2023)
- Year:
- 2023
- Volume:
- 53
- Issue:
- 3
- Issue Sort Value:
- 2023-0053-0003-0000
- Page Start:
- 600
- Page End:
- 630
- Publication Date:
- 2022-11-15
- Subjects:
- balancing policy -- data availability -- data replication -- distributed file systems -- fault tolerance -- reliability
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.3162 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 25700.xml