Performance optimization of Sparse Matrix‐Vector Multiplication for multi‐component PDE‐based applications using GPUs. (21st May 2016)
- Record Type:
- Journal Article
- Title:
- Performance optimization of Sparse Matrix‐Vector Multiplication for multi‐component PDE‐based applications using GPUs. (21st May 2016)
- Main Title:
- Performance optimization of Sparse Matrix‐Vector Multiplication for multi‐component PDE‐based applications using GPUs
- Authors:
- Abdelfattah, Ahmad
Ltaief, Hatem
Keyes, David
Dongarra, Jack - Other Names:
- Notare Mirela Sechi Moretti Annoni guestEditor.
Lengauer Christian guestEditor.
Bougé Luc guestEditor.
Träff Jesper Larsson guestEditor. - Abstract:
- Summary: Simulations of many multi‐component PDE‐based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory‐bound Sparse Matrix‐Vector Multiplication (SpMV) is the most time‐consuming operation in such solvers. Multi‐species models produce Jacobians with a dense block structure, where the block size can be as large as a few dozen. Failing to exploit this dense block structure vastly underutilizes hardware capable of delivering high performance on dense BLAS operations. This paper presents a GPU‐accelerated SpMV kernel for block‐sparse matrices. Dense matrix‐vector multiplications within the sparse‐block structure leverage optimization techniques from the KBLAS library, a high performance library for dense BLAS kernels. The design ideas of KBLAS can be applied to block‐sparse matrices. Furthermore, a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows. Multi‐GPU performance is highlighted. The proposed SpMV kernel outperforms existing state‐of‐the‐art implementations using matrices with real structures from different applications. Copyright © 2016 John Wiley & Sons, Ltd.
- Is Part Of:
- Concurrency and computation. Volume 28:Number 12(2016)
- Journal:
- Concurrency and computation
- Issue:
- Volume 28:Number 12(2016)
- Issue Display:
- Volume 28, Issue 12 (2016)
- Year:
- 2016
- Volume:
- 28
- Issue:
- 12
- Issue Sort Value:
- 2016-0028-0012-0000
- Page Start:
- 3447
- Page End:
- 3465
- Publication Date:
- 2016-05-21
- Subjects:
- sparse matrix‐vector multiplication -- GPU optimizations -- block sparse matrices
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3874 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2620.xml