Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor. (November 2019)
- Record Type:
- Journal Article
- Title:
- Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor. (November 2019)
- Main Title:
- Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor
- Authors:
- Denis, Alexandre
Jaeger, Julien
Jeannot, Emmanuel
Pérache, Marc
Taboada, Hugo - Other Names:
- Dongarra Jack guest-editor.
Tourancheau Bernard guest-editor. - Abstract:
- To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the multiprocessor computing framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results forTo amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the multiprocessor computing framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap. … (more)
- Is Part Of:
- International journal of high performance computing applications. Volume 33:Number 6(2019)
- Journal:
- International journal of high performance computing applications
- Issue:
- Volume 33:Number 6(2019)
- Issue Display:
- Volume 33, Issue 6 (2019)
- Year:
- 2019
- Volume:
- 33
- Issue:
- 6
- Issue Sort Value:
- 2019-0033-0006-0000
- Page Start:
- 1240
- Page End:
- 1254
- Publication Date:
- 2019-11
- Subjects:
- Nonblocking collectives -- MPI -- placement -- communication/computation overlap
High performance computing -- Periodicals
Supercomputers -- Periodicals
004.1105 - Journal URLs:
- http://hpc.sagepub.com ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1177/1094342019860184 ↗
- Languages:
- English
- ISSNs:
- 1094-3420
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11258.xml