A fine-grained block ILU scheme on regular structures for GPGPUs. (22nd September 2015)
- Record Type:
- Journal Article
- Title:
- A fine-grained block ILU scheme on regular structures for GPGPUs. (22nd September 2015)
- Main Title:
- A fine-grained block ILU scheme on regular structures for GPGPUs
- Authors:
- Luo, Lixiang
Edwards, Jack R.
Luo, Hong
Mueller, Frank - Abstract:
- Highlights: A fine-grained block ILU (FGBILIU) is been implemented using OpenACC and CUDA. A fully vectorized inversion algorithm using Gauss–Jordan elimination is developed. FGBILU remains mathematically identical to sequential BILU. FGBILU provides excellent speedup over sequential BILU on CPU. FGBILU has been fully incorporated and validated in a legacy CFD solver INCOMP3D. Abstract: Iterative methods based on block incomplete LU (BILU) factorization are considered highly effective for solving large-scale block-sparse linear systems resulting from coupled PDE systems with n equations. However, efforts on porting implicit PDE solvers to massively parallel shared-memory heterogeneous architectures, such as general-purpose graphics processing units (GPGPUs), have largely avoided BILU, leaving their enormous performance potential unfulfilled in many applications where the use of implicit schemes and BILU-type preconditioners/solvers is highly preferred. Indeed, strong inherent data dependency and high memory bandwidth demanded by block matrix operations render naive adoptions of existing sequential BILU algorithms extremely inefficient on GPGPUs. In this study, we present a fine-grained BILU (FGBILU) scheme which is particularly effective on GPGPUs. A straightforward one-sweep wavefront ordering is employed to resolve data dependency. Granularity is substantially refined as block matrix operations are carried out in a true element-wise approach. Particularly, the inversion ofHighlights: A fine-grained block ILU (FGBILIU) is been implemented using OpenACC and CUDA. A fully vectorized inversion algorithm using Gauss–Jordan elimination is developed. FGBILU remains mathematically identical to sequential BILU. FGBILU provides excellent speedup over sequential BILU on CPU. FGBILU has been fully incorporated and validated in a legacy CFD solver INCOMP3D. Abstract: Iterative methods based on block incomplete LU (BILU) factorization are considered highly effective for solving large-scale block-sparse linear systems resulting from coupled PDE systems with n equations. However, efforts on porting implicit PDE solvers to massively parallel shared-memory heterogeneous architectures, such as general-purpose graphics processing units (GPGPUs), have largely avoided BILU, leaving their enormous performance potential unfulfilled in many applications where the use of implicit schemes and BILU-type preconditioners/solvers is highly preferred. Indeed, strong inherent data dependency and high memory bandwidth demanded by block matrix operations render naive adoptions of existing sequential BILU algorithms extremely inefficient on GPGPUs. In this study, we present a fine-grained BILU (FGBILU) scheme which is particularly effective on GPGPUs. A straightforward one-sweep wavefront ordering is employed to resolve data dependency. Granularity is substantially refined as block matrix operations are carried out in a true element-wise approach. Particularly, the inversion of diagonal blocks, a well-known bottleneck, is accomplished by a parallel in-place Gauss–Jordan elimination. As a result, FGBILU is able to offer low-overhead concurrent computation at O ( n 2 N 2 ) scale on a 3D PDE domain with a linear scale of N . FGBILU has been implemented with both OpenACC and CUDA and tested as a block-sparse linear solver on a structured 3D grid. While FGBILU remains mathematically identical to sequential global BILU, numerical experiments confirm its exceptional performance on an Nvidia GPGPU. … (more)
- Is Part Of:
- Computers & fluids. Volume 119(2015)
- Journal:
- Computers & fluids
- Issue:
- Volume 119(2015)
- Issue Display:
- Volume 119, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 119
- Issue:
- 2015
- Issue Sort Value:
- 2015-0119-2015-0000
- Page Start:
- 149
- Page End:
- 161
- Publication Date:
- 2015-09-22
- Subjects:
- Block ILU -- Block-sparse linear systems -- Wavefront scheme -- GPGPU -- OpenACC -- CUDA
Fluid dynamics -- Data processing -- Periodicals
532.050285 - Journal URLs:
- http://www.journals.elsevier.com/computers-and-fluids/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compfluid.2015.07.005 ↗
- Languages:
- English
- ISSNs:
- 0045-7930
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.690000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8707.xml