NAS Parallel Benchmarks with CUDA and beyond. (28th November 2021)
- Record Type:
- Journal Article
- Title:
- NAS Parallel Benchmarks with CUDA and beyond. (28th November 2021)
- Main Title:
- NAS Parallel Benchmarks with CUDA and beyond
- Authors:
- Araujo, Gabriell
Griebler, Dalvan
Rockenbach, Dinei A.
Danelutto, Marco
Fernandes, Luiz G. - Other Names:
- Chandrasekaran Sunita guestEditor.
Si Min guestEditor.
Zhai Jidong guestEditor.
Oden Lena guestEditor. - Abstract:
- Abstract: NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade‐off. Lastly, we highlightAbstract: NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the evaluation of parallel hardware and software. Several research efforts from academia have made these benchmarks available with different parallel programming models beyond the original versions with OpenMP and MPI. This work joins these research efforts by providing a new CUDA implementation for NPB. Our contribution covers different aspects beyond the implementation. First, we define design principles based on the best programming practices for GPUs and apply them to each benchmark using CUDA. Second, we provide ease of use parametrization support for configuring the number of threads per block in our version. Third, we conduct a broad study on the impact of the number of threads per block in the benchmarks. Fourth, we propose and evaluate five strategies for helping to find a better number of threads per block configuration. The results have revealed relevant performance improvement solely by changing the number of threads per block, showing performance improvements from 8% up to 717% among the benchmarks. Fifth, we conduct a comparative analysis with the literature, evaluating performance, memory consumption, code refactoring required, and parallelism implementations. The performance results have shown up to 267% improvements over the best benchmarks versions available. We also observe the best and worst design choices, concerning code size and the performance trade‐off. Lastly, we highlight the challenges of implementing parallel CFD applications for GPUs and how the computations impact the GPU's behavior. … (more)
- Is Part Of:
- Software, practice & experience. Volume 53:Number 1(2023)
- Journal:
- Software, practice & experience
- Issue:
- Volume 53:Number 1(2023)
- Issue Display:
- Volume 53, Issue 1 (2023)
- Year:
- 2023
- Volume:
- 53
- Issue:
- 1
- Issue Sort Value:
- 2023-0053-0001-0000
- Page Start:
- 53
- Page End:
- 80
- Publication Date:
- 2021-11-28
- Subjects:
- graphics processing units -- high‐performance computing -- NPB -- parallel applications -- parallel programming -- performance analysis
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.3056 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 24724.xml