Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor. (November 2019)

Record Type:: Journal Article
Title:: Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor. (November 2019)
Main Title:: Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor
Authors:: Denis, Alexandre
Jaeger, Julien
Jeannot, Emmanuel
Pérache, Marc
Taboada, Hugo
Other Names:: Dongarra Jack guest-editor.
Tourancheau Bernard guest-editor.
Abstract:: To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the multiprocessor computing framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for … (more)
Is Part Of:: International journal of high performance computing applications. Volume 33:Number 6(2019)
Journal:: International journal of high performance computing applications
Issue:: Volume 33:Number 6(2019)
Issue Display:: Volume 33, Issue 6 (2019)
Year:: 2019
Volume:: 33
Issue:: 6
Issue Sort Value:: 2019-0033-0006-0000
Page Start:: 1240
Page End:: 1254
Publication Date:: 2019-11
Subjects:: Nonblocking collectives -- MPI -- placement -- communication/computation overlap
High performance computing -- Periodicals
Supercomputers -- Periodicals
004.1105
Journal URLs:: http://hpc.sagepub.com ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗
DOI:: 10.1177/1094342019860184 ↗
Languages:: English
ISSNs:: 1094-3420
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 11258.xml