A framework for dense triangular matrix kernels on various manycore architectures. (5th June 2017)
- Record Type:
- Journal Article
- Title:
- A framework for dense triangular matrix kernels on various manycore architectures. (5th June 2017)
- Main Title:
- A framework for dense triangular matrix kernels on various manycore architectures
- Authors:
- Charara, Ali
Keyes, David
Ltaief, Hatem - Other Names:
- Lengauer Christian guestEditor.
Bougé Luc guestEditor.
Trystram Denis guestEditor.
Balaji Pavan guestEditor.
Leung Kai‐Cheung guestEditor. - Abstract:
- Summary: We present a new high‐performance framework for dense triangular Basic Linear Algebra Subroutines (BLAS) kernels, ie, triangular matrix‐matrix multiplication (TRMM) and triangular solve (TRSM), on various manycore architectures. This is an extension of a previous work on a single GPU by the same authors, presented at the EuroPar'16 conference, in which we demonstrated the effectiveness of recursive formulations in enhancing the performance of these kernels. In this paper, the performance of triangular BLAS kernels on a single GPU is further enhanced by implementing customized in‐place CUDA kernels for TRMM and TRSM, which are called at the bottom of the recursion. In addition, a multi‐GPU implementation of TRMM and TRSM is proposed and we show an almost linear performance scaling, as the number of GPUs increases. Finally, the algorithmic recursive formulation of these triangular BLAS kernels is in fact oblivious to the targeted hardware architecture. We, therefore, port these recursive kernels to homogeneous x86 hardware architectures by relying on the vendor optimized BLAS implementations. Results reported on various hardware architectures highlight a significant performance improvement against state‐of‐the‐art implementations. These new kernels are freely available in the KAUST BLAS (KBLAS) open‐source library athttps://github.com/ecrc/kblas .
- Is Part Of:
- Concurrency and computation. Volume 29:Number 15(2017)
- Journal:
- Concurrency and computation
- Issue:
- Volume 29:Number 15(2017)
- Issue Display:
- Volume 29, Issue 15 (2017)
- Year:
- 2017
- Volume:
- 29
- Issue:
- 15
- Issue Sort Value:
- 2017-0029-0015-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2017-06-05
- Subjects:
- dense triangular matrix computations -- KBLAS -- manycore optimizations -- recursive formulation
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.4187 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2817.xml