A new deadlock resolution protocol and message matching algorithm for the extreme‐scale simulator. (22nd March 2016)
- Record Type:
- Journal Article
- Title:
- A new deadlock resolution protocol and message matching algorithm for the extreme‐scale simulator. (22nd March 2016)
- Main Title:
- A new deadlock resolution protocol and message matching algorithm for the extreme‐scale simulator
- Authors:
- Engelmann, Christian
Naughton, Thomas - Other Names:
- Notare Mirela Sechi Moretti Annoni guestEditor.
Lengauer Christian guestEditor.
Bougé Luc guestEditor.
Träff Jesper Larsson guestEditor. - Abstract:
- Summary: Investigating the performance of parallel applications at scale on future high‐performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co‐design. The Extreme‐scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The xSim toolkit strives to limit simulation overheads in order to maintain performance and productivity criteria. This paper documents two improvements to xSim: (1) a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2) a new simulated MPI message matching algorithm to reduce the oversubscription management cost. These enhancements resulted in significant performance improvements. The simulation overhead for running the NASA Advanced Supercomputing Parallel Benchmark suite dropped from 1, 020% to 238% for the conjugate gradient benchmark and 102% to 0% for the embarrassingly parallel benchmark. Additionally, the improvements were beneficial for reducing overheads in the highly accurate simulation mode of xSim, which is useful for resilience investigation studies for tracking intentional MPI process failures. In the highly accurate mode, the simulation overhead was reduced from 37, 511% to 13, 808% for conjugate gradient and from 3, 332% to 204% for embarrassingly parallel.Summary: Investigating the performance of parallel applications at scale on future high‐performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co‐design. The Extreme‐scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The xSim toolkit strives to limit simulation overheads in order to maintain performance and productivity criteria. This paper documents two improvements to xSim: (1) a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2) a new simulated MPI message matching algorithm to reduce the oversubscription management cost. These enhancements resulted in significant performance improvements. The simulation overhead for running the NASA Advanced Supercomputing Parallel Benchmark suite dropped from 1, 020% to 238% for the conjugate gradient benchmark and 102% to 0% for the embarrassingly parallel benchmark. Additionally, the improvements were beneficial for reducing overheads in the highly accurate simulation mode of xSim, which is useful for resilience investigation studies for tracking intentional MPI process failures. In the highly accurate mode, the simulation overhead was reduced from 37, 511% to 13, 808% for conjugate gradient and from 3, 332% to 204% for embarrassingly parallel. Copyright © 2016 John Wiley & Sons, Ltd. … (more)
- Is Part Of:
- Concurrency and computation. Volume 28:Number 12(2016)
- Journal:
- Concurrency and computation
- Issue:
- Volume 28:Number 12(2016)
- Issue Display:
- Volume 28, Issue 12 (2016)
- Year:
- 2016
- Volume:
- 28
- Issue:
- 12
- Issue Sort Value:
- 2016-0028-0012-0000
- Page Start:
- 3369
- Page End:
- 3389
- Publication Date:
- 2016-03-22
- Subjects:
- performance prediction -- message passing interface -- parallel discrete event simulation -- high‐performance computing
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3805 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2620.xml