A checkpoint compression study for high-performance computing systems. (November 2015)
- Record Type:
- Journal Article
- Title:
- A checkpoint compression study for high-performance computing systems. (November 2015)
- Main Title:
- A checkpoint compression study for high-performance computing systems
- Authors:
- Ibtesham, Dewan
Ferreira, Kurt B
Arnold, Dorian - Abstract:
- As high-performance computing systems continue to increase in size and complexity, higher failure rates and increased overheads for checkpoint/restart (CR) protocols have raised concerns about the practical viability of CR protocols for future systems. Previously, compression has proven to be a viable approach for reducing checkpoint data volumes and, thereby, reducing CR protocol overhead leading to improved application performance. In this article, we further explore compression-based CR optimization by exploring its baseline performance and scaling properties, evaluating whether improved compression algorithms might lead to even better application performance and comparing checkpoint compression against and alongside other software- and hardware-based optimizations. Our results highlights are that: (1) compression is a very viable CR optimization; (2) generic, text-based compression algorithms appear to perform near optimally for checkpoint data compression and faster compression algorithms will not lead to better application performance; (3) compression-based optimizations fare well against and alongside other software-based optimizations; and (4) while hardware-based optimizations outperform software-based ones, they are not as cost effective.
- Is Part Of:
- International journal of high performance computing applications. Volume 29:Number 4(2015:Winter)
- Journal:
- International journal of high performance computing applications
- Issue:
- Volume 29:Number 4(2015:Winter)
- Issue Display:
- Volume 29, Issue 4 (2015)
- Year:
- 2015
- Volume:
- 29
- Issue:
- 4
- Issue Sort Value:
- 2015-0029-0004-0000
- Page Start:
- 387
- Page End:
- 402
- Publication Date:
- 2015-11
- Subjects:
- Fault tolerance -- checkpoint/restart -- checkpoint compression
High performance computing -- Periodicals
Supercomputers -- Periodicals
004.1105 - Journal URLs:
- http://hpc.sagepub.com ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1177/1094342015570921 ↗
- Languages:
- English
- ISSNs:
- 1094-3420
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6772.xml