Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling. (13th August 2018)
- Record Type:
- Journal Article
- Title:
- Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling. (13th August 2018)
- Main Title:
- Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling
- Authors:
- Sun, Jian
Fu, Joshua S.
Drake, John B.
Zhu, Qingzhao
Haidar, Azzam
Gates, Mark
Tomov, Stanimire
Dongarra, Jack - Abstract:
- Abstract: Global chemistry‐climate models are computationally burdened as the chemical mechanisms become more complex and realistic. Optimization for graphics processing units (GPU) may make longer global simulation with regional detail possible, but limited study has been done to explore the potential benefit for the atmospheric chemistry modeling. Hence, in this study, the second‐order Rosenbrock solver of the chemistry module of CAM4‐Chem is ported to the GPU to gauge potential speed‐up. We find that on the CPU, the fastest performance is achieved using the Intel compiler with a block interleaved memory layout. Different combinations of compiler and memory layout lead to ~11.02× difference in the computational time. In contrast, the GPU version performs the best when using a combination of fully interleaved memory layout with block size equal to the warp size, CUDA streams for independent kernels, and constant memory. Moreover, the most efficient data transfer between CPU and GPU is gained by allocating the memory contiguously during the data initialization on the GPU. Compared to one CPU core, the speed‐up of using one GPU alone reaches a factor of ~11.7× for the computation alone and ~3.82× when the data transfer between CPU and GPU is considered. Using one GPU alone is also generally faster than the multithreaded implementation for 16 CPU cores in a compute node and the single‐source solution (OpenACC). The best performance is achieved by the implementation of theAbstract: Global chemistry‐climate models are computationally burdened as the chemical mechanisms become more complex and realistic. Optimization for graphics processing units (GPU) may make longer global simulation with regional detail possible, but limited study has been done to explore the potential benefit for the atmospheric chemistry modeling. Hence, in this study, the second‐order Rosenbrock solver of the chemistry module of CAM4‐Chem is ported to the GPU to gauge potential speed‐up. We find that on the CPU, the fastest performance is achieved using the Intel compiler with a block interleaved memory layout. Different combinations of compiler and memory layout lead to ~11.02× difference in the computational time. In contrast, the GPU version performs the best when using a combination of fully interleaved memory layout with block size equal to the warp size, CUDA streams for independent kernels, and constant memory. Moreover, the most efficient data transfer between CPU and GPU is gained by allocating the memory contiguously during the data initialization on the GPU. Compared to one CPU core, the speed‐up of using one GPU alone reaches a factor of ~11.7× for the computation alone and ~3.82× when the data transfer between CPU and GPU is considered. Using one GPU alone is also generally faster than the multithreaded implementation for 16 CPU cores in a compute node and the single‐source solution (OpenACC). The best performance is achieved by the implementation of the hybrid CPU/GPU version, but rescheduling the workload among the CPU cores is required before the practical CAM4‐Chem simulation. Key Points: A combination of fully interleaved memory layout, CUDA streams, and constant memory yields the best performance on the GPU The Intel compiler with block‐interleaved memory layout provides the best performance on the CPU The GPU version achieves a factor of ~11.7× speed‐up for computation alone and ~3.82× speed‐up when the data transfer is considered … (more)
- Is Part Of:
- Journal of advances in modeling earth systems. Volume 10:Number 8(2018)
- Journal:
- Journal of advances in modeling earth systems
- Issue:
- Volume 10:Number 8(2018)
- Issue Display:
- Volume 10, Issue 8 (2018)
- Year:
- 2018
- Volume:
- 10
- Issue:
- 8
- Issue Sort Value:
- 2018-0010-0008-0000
- Page Start:
- 1952
- Page End:
- 1969
- Publication Date:
- 2018-08-13
- Subjects:
- GPU -- CUDA -- compiler -- memory layout -- data transfer -- hybrid
Geological modeling -- Periodicals
Climatology -- Periodicals
Geochemical modeling -- Periodicals
551.5011 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1942-2466 ↗
http://onlinelibrary.wiley.com/ ↗
http://adv-model-earth-syst.org/ ↗ - DOI:
- 10.1029/2018MS001276 ↗
- Languages:
- English
- ISSNs:
- 1942-2466
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 7757.xml