Towards self‐caring MapReduce: a study of performance penalties under faults. (28th May 2013)
- Record Type:
- Journal Article
- Title:
- Towards self‐caring MapReduce: a study of performance penalties under faults. (28th May 2013)
- Main Title:
- Towards self‐caring MapReduce: a study of performance penalties under faults
- Authors:
- Kadirvel, Selvi
Fortes, José A.B.
Smari, Waleed
Fiore, Sandro
Trinitis, Carsten
Ranjan, Rajiv
Buyya, Rajkumar
Nepal, Surya
Georgakopulos, Dimitrios - Abstract:
- <abstract abstract-type="main" id="cpe3044-abs-0001"> <title>Summary</title> <p id="cpe3044-para-0001">Self‐caring IT systems are those that can proactively avoid system failures rather than reactively handle failures after they have occurred. In this paper, we focus on failures in which a MapReduce job is unable to execute within an service‐level agreement based completion time. The existing fault‐tolerance capability provided by MapReduce frameworks such as Hadoop, is simple and the penalty associated with handling faults could potentially lead to excessive job execution times. Our goal in this paper is to bring out the severity of this penalty for different job and framework parameters. We quantitatively evaluate the penalty in execution time associated with node faults using the MRPerf simulator. We then perform an empirical study of penalties on a virtualized testbed consisting of Xen domains, by varying system characteristics along four dimensions: hardware, application, dataset, and fault types. Through simulation and empirical results, we show that job‐completion‐time service‐level agreement violations can be reduced using dynamic resource scaling. Scaling leverages, the elastic properties of a virtualized environment, to mitigate execution time penalties and hence proactively avoids a potential job failure. We show that using resource scaling, performance penalties can be decreased to less than 5% of the no‐fault execution time, at minimal additional cost. Copyright<abstract abstract-type="main" id="cpe3044-abs-0001"> <title>Summary</title> <p id="cpe3044-para-0001">Self‐caring IT systems are those that can proactively avoid system failures rather than reactively handle failures after they have occurred. In this paper, we focus on failures in which a MapReduce job is unable to execute within an service‐level agreement based completion time. The existing fault‐tolerance capability provided by MapReduce frameworks such as Hadoop, is simple and the penalty associated with handling faults could potentially lead to excessive job execution times. Our goal in this paper is to bring out the severity of this penalty for different job and framework parameters. We quantitatively evaluate the penalty in execution time associated with node faults using the MRPerf simulator. We then perform an empirical study of penalties on a virtualized testbed consisting of Xen domains, by varying system characteristics along four dimensions: hardware, application, dataset, and fault types. Through simulation and empirical results, we show that job‐completion‐time service‐level agreement violations can be reduced using dynamic resource scaling. Scaling leverages, the elastic properties of a virtualized environment, to mitigate execution time penalties and hence proactively avoids a potential job failure. We show that using resource scaling, performance penalties can be decreased to less than 5% of the no‐fault execution time, at minimal additional cost. Copyright © 2013 John Wiley &amp; Sons, Ltd.</p> </abstract> … (more)
- Is Part Of:
- Concurrency and computation. Volume 27:Number 9(2015:Jun.)
- Journal:
- Concurrency and computation
- Issue:
- Volume 27:Number 9(2015:Jun.)
- Issue Display:
- Volume 27, Issue 9 (2015)
- Year:
- 2015
- Volume:
- 27
- Issue:
- 9
- Issue Sort Value:
- 2015-0027-0009-0000
- Page Start:
- 2310
- Page End:
- 2328
- Publication Date:
- 2013-05-28
- Subjects:
- Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3044 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 4078.xml