A CUDA fast multipole method with highly efficient M2L far field evaluation. (January 2021)
- Record Type:
- Journal Article
- Title:
- A CUDA fast multipole method with highly efficient M2L far field evaluation. (January 2021)
- Main Title:
- A CUDA fast multipole method with highly efficient M2L far field evaluation
- Authors:
- Kohnke, Bartosz
Kutzner, Carsten
Beckmann, Andreas
Lube, Gert
Kabadshow, Ivo
Dachsel, Holger
Grubmüller, Helmut - Other Names:
- Wyrzykowski Roman guest-editor.
Deelman Ewa guest-editor. - Abstract:
- Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs). However, the naïve pairwise summation hasO ( N 2 ) computational complexity. The fast multipole method (FMM) can reduce runtime and complexity toO ( N ) for any specified precision. Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems withr − 1 potential that are found, e.g. in biomolecular simulations. The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmark three different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming and porting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach (3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance is limited by arithmetic operations rather than by memory accesses. The complete CUDASolving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs). However, the naïve pairwise summation hasO ( N 2 ) computational complexity. The fast multipole method (FMM) can reduce runtime and complexity toO ( N ) for any specified precision. Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems withr − 1 potential that are found, e.g. in biomolecular simulations. The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmark three different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming and porting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach (3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance is limited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver. … (more)
- Is Part Of:
- International journal of high performance computing applications. Volume 35:Number 1(2021)
- Journal:
- International journal of high performance computing applications
- Issue:
- Volume 35:Number 1(2021)
- Issue Display:
- Volume 35, Issue 1 (2021)
- Year:
- 2021
- Volume:
- 35
- Issue:
- 1
- Issue Sort Value:
- 2021-0035-0001-0000
- Page Start:
- 97
- Page End:
- 117
- Publication Date:
- 2021-01
- Subjects:
- Fast multipole method -- Multipole-to-Local -- molecular dynamics -- electrostatics -- CUDA
High performance computing -- Periodicals
Supercomputers -- Periodicals
004.1105 - Journal URLs:
- http://hpc.sagepub.com ↗
http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗ - DOI:
- 10.1177/1094342020964857 ↗
- Languages:
- English
- ISSNs:
- 1094-3420
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14853.xml