Exploring the feasibility of lossy compression for PDE simulations. (March 2019)
- Record Type:
- Journal Article
- Title:
- Exploring the feasibility of lossy compression for PDE simulations. (March 2019)
- Main Title:
- Exploring the feasibility of lossy compression for PDE simulations
- Authors:
- Calhoun, Jon
Cappello, Franck
Olson, Luke N
Snir, Marc
Gropp, William D - Abstract:
- Checkpoint restart plays an important role in high-performance computing (HPC) applications, allowing simulation runtime to extend beyond a single job allocation and facilitating recovery from hardware failure. Yet, as machines grow in size and in complexity, traditional approaches to checkpoint restart are becoming prohibitive. Current methods store a subset of the application's state and exploit the memory hierarchy in the machine. However, as the energy cost of data movement continues to dominate, further reductions in checkpoint size are needed. Lossy compression, which can significantly reduce checkpoint sizes, offers a potential to reduce computational cost in checkpoint restart. This article investigates the use of numerical properties of partial differential equation (PDE) simulations, such as bounds on the truncation error, to evaluate the feasibility of using lossy compression in checkpointing PDE simulations. Restart from a checkpoint with lossy compression is considered for a fail-stop error in two time-dependent HPC application codes: PlasComCM and Nek5000. Results show that error in application variables due to a restart from a lossy compressed checkpoint can be masked by the numerical error in the discretization, leading to increased efficiency in checkpoint restart without influencing overall accuracy in the simulation.
- Is Part Of:
- International journal of high performance computing applications. Volume 33:Number 2(2019)
- Journal:
- International journal of high performance computing applications
- Issue:
- Volume 33:Number 2(2019)
- Issue Display:
- Volume 33, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 33
- Issue:
- 2
- Issue Sort Value:
- 2019-0033-0002-0000
- Page Start:
- 397
- Page End:
- 410
- Publication Date:
- 2019-03
- Subjects:
- Lossy compression -- checkpoint restart -- exascale -- error tolerance selection -- error propagation -- fault tolerance -- compression
High performance computing -- Periodicals
Supercomputers -- Periodicals
004.1105 - Journal URLs:
- http://hpc.sagepub.com ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1177/1094342018762036 ↗
- Languages:
- English
- ISSNs:
- 1094-3420
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9706.xml