Silent error detection in numerical time-stepping schemes. (November 2015)
- Record Type:
- Journal Article
- Title:
- Silent error detection in numerical time-stepping schemes. (November 2015)
- Main Title:
- Silent error detection in numerical time-stepping schemes
- Authors:
- Benson, Austin R
Schmit, Sven
Schreiber, Robert - Abstract:
- Errors due to hardware or low-level software problems, if detected, can be fixed by various schemes, such as recomputation from a checkpoint. Silent errors are errors in application state that have escaped low-level error detection. At extreme scale, where machines can perform astronomically many operations per second, silent errors threaten the validity of computed results. We propose a new paradigm for detecting silent errors at the application level. Our central idea is to frequently compare computed values to those provided by a cheap checking computation, and to build error detectors based on the difference between the two output sequences. Numerical analysis provides us with usable checking computations for the solution of initial-value problems in ODEs and PDEs, arguably the most common problems in computational science. Here, we provide, optimize, and test methods based on Runge–Kutta and linear multistep methods for ODEs, and on implicit and explicit finite difference schemes for PDEs. We take the heat equation and Navier–Stokes equations as examples. In tests with artificially injected errors, this approach effectively detects almost all meaningful errors, without significant slowdown.
- Is Part Of:
- International journal of high performance computing applications. Volume 29:Number 4(2015:Winter)
- Journal:
- International journal of high performance computing applications
- Issue:
- Volume 29:Number 4(2015:Winter)
- Issue Display:
- Volume 29, Issue 4 (2015)
- Year:
- 2015
- Volume:
- 29
- Issue:
- 4
- Issue Sort Value:
- 2015-0029-0004-0000
- Page Start:
- 403
- Page End:
- 421
- Publication Date:
- 2015-11
- Subjects:
- Silent errors -- resilience -- Runge-Kutta -- linear multi-step methods -- heat equation -- initial-value problems
High performance computing -- Periodicals
Supercomputers -- Periodicals
004.1105 - Journal URLs:
- http://hpc.sagepub.com ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1177/1094342014532297 ↗
- Languages:
- English
- ISSNs:
- 1094-3420
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6772.xml