Static code transformations for thread‐dense memory accesses in GPU computing. (18th October 2019)
- Record Type:
- Journal Article
- Title:
- Static code transformations for thread‐dense memory accesses in GPU computing. (18th October 2019)
- Main Title:
- Static code transformations for thread‐dense memory accesses in GPU computing
- Authors:
- Kim, Hyunjun
Hong, Sungin
Park, Jeonghwan
Han, Hwansoo - Abstract:
- Summary: Due to the GPU's complex memory system and massive thread‐level parallelism, application programmers often have difficulty optimizing GPU programs. An essential approach to memory optimization is to utilize low‐latency on‐chip memory to avoid high latency of off‐chip memory accesses. Shared memory is an on‐chip memory, which is explicitly managed by programmers. Shared memory has a read/write latency similar to that of the L1 cache, but poor data management can degrade performance. In this paper, we present a static code transformation that preloads dataset in GPU's shared memory. Our static analysis primarily targets global memory requests with high thread‐density for preloading in shared memory. The thread‐dense memory access pattern is a pattern in which many threads efficiently manage the address space of shared memory, as well as reuse the same data in a thread block. We limit the usage of shared memory so that thread‐level parallelism remains at the same level when selecting datasets for preloading. Finally, our source‐to‐source compiler allows to preload selected datasets in shared memory by transforming non‐optimized GPU kernel code. Our methods achieve 1.26× and 1.62× speedups on average (geometric mean), respectively with GTX980 and P100 GPUs.
- Is Part Of:
- Concurrency and computation. Volume 32:Number 5(2020)
- Journal:
- Concurrency and computation
- Issue:
- Volume 32:Number 5(2020)
- Issue Display:
- Volume 32, Issue 5 (2020)
- Year:
- 2020
- Volume:
- 32
- Issue:
- 5
- Issue Sort Value:
- 2020-0032-0005-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2019-10-18
- Subjects:
- code transformation -- GPU computing -- shared memory -- static analysis
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.5512 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 12795.xml