Developing a scalable hybrid MPI/OpenMP unstructured finite element model. (30th March 2015)
- Record Type:
- Journal Article
- Title:
- Developing a scalable hybrid MPI/OpenMP unstructured finite element model. (30th March 2015)
- Main Title:
- Developing a scalable hybrid MPI/OpenMP unstructured finite element model
- Authors:
- Guo, Xiaohu
Lange, Michael
Gorman, Gerard
Mitchell, Lawrence
Weiland, Michèle - Abstract:
- Highlights: Threaded preconditioner have been experimented. Investigated the multigrid performance in detail. The code can do the strong scaling up to 16 thousands cores. Reviewed achievements and lessons for overall OpenMP implementation. Abstract: The trend of all modern computer architectures, and the path to exascale, is towards increasing numbers of lower power cores, with a decreasing memory to core ratio. This imposes a strong evolutionary pressure on algorithms and software to efficiently utilise all levels of parallelism available on a given platform while minimising data movement. Unstructured finite elements codes have long been effectively parallelised using domain decomposition methods, implemented using libraries such as the Message Passing Interface (MPI). However, there are many optimisation opportunities when threading is used for intra-node parallelisation for the latest multi-core/many-core platforms. The benefits include increased algorithmic freedom, reduced memory requirements, cache sharing, reduced number of partitions, less MPI communication and I/O overhead. In this paper, we report progress in implementing a hybrid OpenMP–MPI version of the unstructured finite element code Fluidity. For matrix assembly kernels, the OpenMP parallel algorithm uses graph colouring to identify independent sets of elements that can be assembled concurrently with no race conditions. In this phase there are no MPI overheads as each MPI process only assembles its own localHighlights: Threaded preconditioner have been experimented. Investigated the multigrid performance in detail. The code can do the strong scaling up to 16 thousands cores. Reviewed achievements and lessons for overall OpenMP implementation. Abstract: The trend of all modern computer architectures, and the path to exascale, is towards increasing numbers of lower power cores, with a decreasing memory to core ratio. This imposes a strong evolutionary pressure on algorithms and software to efficiently utilise all levels of parallelism available on a given platform while minimising data movement. Unstructured finite elements codes have long been effectively parallelised using domain decomposition methods, implemented using libraries such as the Message Passing Interface (MPI). However, there are many optimisation opportunities when threading is used for intra-node parallelisation for the latest multi-core/many-core platforms. The benefits include increased algorithmic freedom, reduced memory requirements, cache sharing, reduced number of partitions, less MPI communication and I/O overhead. In this paper, we report progress in implementing a hybrid OpenMP–MPI version of the unstructured finite element code Fluidity. For matrix assembly kernels, the OpenMP parallel algorithm uses graph colouring to identify independent sets of elements that can be assembled concurrently with no race conditions. In this phase there are no MPI overheads as each MPI process only assembles its own local part of the global matrix. We use an OpenMP threaded fork of PETSc to solve the resulting sparse linear systems of equations. We experiment with a range of preconditioners, including HYPRE which provides the algebraic multigrid preconditioner BoomerAMG where the smoother is also threaded. Since unstructured finite element codes are well known to be memory latency bound, particular attention is paid to ccNUMA architectures where data locality is particularly important to achieve good intra-node scaling characteristics. We also demonstrate that utilising non-blocking algorithms and libraries are critical to mixed-mode application so that it can achieve better parallel performance than the pure MPI version. … (more)
- Is Part Of:
- Computers & fluids. Volume 110(2015)
- Journal:
- Computers & fluids
- Issue:
- Volume 110(2015)
- Issue Display:
- Volume 110, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 110
- Issue:
- 2015
- Issue Sort Value:
- 2015-0110-2015-0000
- Page Start:
- 227
- Page End:
- 234
- Publication Date:
- 2015-03-30
- Subjects:
- Fluidity-ICOM -- OpenMP -- MPI -- FEM -- Matrix assembly -- Sparse linear solver -- HYPRE -- PETSc -- SpMV
Fluid dynamics -- Data processing -- Periodicals
532.050285 - Journal URLs:
- http://www.journals.elsevier.com/computers-and-fluids/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compfluid.2014.09.007 ↗
- Languages:
- English
- ISSNs:
- 0045-7930
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.690000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 7367.xml