Application-bypass reduction for large-scale clusters. (6th February 2006)
- Record Type:
- Journal Article
- Title:
- Application-bypass reduction for large-scale clusters. (6th February 2006)
- Main Title:
- Application-bypass reduction for large-scale clusters
- Authors:
- Wagner, Adam
Buntinas, Darius
Brightwell, Ron
Panda, Dhabaleswar K. - Abstract:
- Process skew is an important factor in the performance of parallel applications, especially in large-scale clusters. Reduction is a common collective operation which, by its nature, introduces implicit synchronisation between the processes involved in the communication and is therefore highly susceptible to performance degradation due to process skew. A collective operation with application-bypass does not require the application to block in order for the operation to make progress. Application-bypass collective operations are therefore highly tolerant of skew. In this paper, we describe the design and implementation of an application-bypass version of the reduction operation in MPICH over GM. We evaluate our implementation on a 32-node cluster. Under conditions of process skew we find a factor of improvement of up to 5.1 for our application-bypass reduction versus the default MPICH implementation. In addition, we see that this factor of improvement increases with system size, indicating that the application-bypass implementation is more scalable and skew-tolerant than the default non-application-bypass version. This framework promises design and development of high-performance and scalable collective communication libraries for next-generation large-scale clusters.
- Is Part Of:
- International journal of high performance computing and networking. Volume 2:Number 2/3/4(2004)
- Journal:
- International journal of high performance computing and networking
- Issue:
- Volume 2:Number 2/3/4(2004)
- Issue Display:
- Volume 2, Issue 2/3/4 (2004)
- Year:
- 2004
- Volume:
- 2
- Issue:
- 2/3/4
- Issue Sort Value:
- 2004-0002-NaN-0000
- Page Start:
- 99
- Page End:
- 109
- Publication Date:
- 2006-02-06
- Subjects:
- application-bypass -- reduction -- collective communications -- process skew -- heterogeneous -- cluster computing -- MPI -- MPICH -- GM -- Myrinet
High performance computing -- Periodicals
Computer networks -- Periodicals
High performance computing
Periodicals
004.05 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijhpcn ↗
http://www.metapress.com/openurl.asp?genre=journal&issn=1740-0562 ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1740-0562
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 8687.xml