Vectorizing unstructured mesh computations for many‐core architectures. (28th August 2015)
- Record Type:
- Journal Article
- Title:
- Vectorizing unstructured mesh computations for many‐core architectures. (28th August 2015)
- Main Title:
- Vectorizing unstructured mesh computations for many‐core architectures
- Authors:
- Reguly, I Z.
László, Endre
Mudalige, Gihan R.
Giles, Mike B. - Abstract:
- Summary: Achieving optimal performance on the latest multi‐core and many‐core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and the Xeon‐Phi on a key class of irregular applications: unstructured mesh computations. Using single instruction multiple thread (SIMT) and single instruction multiple data (SIMD) programming models, we show how unstructured mesh computations map to OpenCL or vector intrinsics through the use of code generation techniques in the OP2 Domain Specific Library and explore how irregular memory accesses and race conditions can be organized on different hardware. We benchmark Intel Xeon CPUs and the Xeon‐Phi, using a tsunami simulation and a representative CFD benchmark. Results are compared with previous work on CPUs and NVIDIA GPUs to provide a comparison of achievable performance on current many‐core systems. We show that auto‐vectorization and the OpenCL SIMT model do not map efficiently to CPU vector units because of vectorization issues and threading overheads. In contrast, using SIMD vector intrinsics imposes some restrictions and requires more involved programming techniques but results in efficient code and near‐optimal performance, two times faster than non‐vectorized code. We observe that the Xeon‐Phi does not provide good performance for these applications but is still comparable with a pair of mid‐range XeonSummary: Achieving optimal performance on the latest multi‐core and many‐core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and the Xeon‐Phi on a key class of irregular applications: unstructured mesh computations. Using single instruction multiple thread (SIMT) and single instruction multiple data (SIMD) programming models, we show how unstructured mesh computations map to OpenCL or vector intrinsics through the use of code generation techniques in the OP2 Domain Specific Library and explore how irregular memory accesses and race conditions can be organized on different hardware. We benchmark Intel Xeon CPUs and the Xeon‐Phi, using a tsunami simulation and a representative CFD benchmark. Results are compared with previous work on CPUs and NVIDIA GPUs to provide a comparison of achievable performance on current many‐core systems. We show that auto‐vectorization and the OpenCL SIMT model do not map efficiently to CPU vector units because of vectorization issues and threading overheads. In contrast, using SIMD vector intrinsics imposes some restrictions and requires more involved programming techniques but results in efficient code and near‐optimal performance, two times faster than non‐vectorized code. We observe that the Xeon‐Phi does not provide good performance for these applications but is still comparable with a pair of mid‐range Xeon chips. Copyright © 2015 John Wiley & Sons, Ltd. … (more)
- Is Part Of:
- Concurrency and computation. Volume 28:Number 2(2016)
- Journal:
- Concurrency and computation
- Issue:
- Volume 28:Number 2(2016)
- Issue Display:
- Volume 28, Issue 2 (2016)
- Year:
- 2016
- Volume:
- 28
- Issue:
- 2
- Issue Sort Value:
- 2016-0028-0002-0000
- Page Start:
- 557
- Page End:
- 577
- Publication Date:
- 2015-08-28
- Subjects:
- vectorization -- Xeon Phi -- AVX -- CUDA -- unstructured grid -- programming abstraction
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3621 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 2672.xml