Auto‐tuning of level 1 and level 2 BLAS for GPUs. (25th September 2012)
- Record Type:
- Journal Article
- Title:
- Auto‐tuning of level 1 and level 2 BLAS for GPUs. (25th September 2012)
- Main Title:
- Auto‐tuning of level 1 and level 2 BLAS for GPUs
- Authors:
- Sørensen, Hans Henrik Brandenborg
Hidalgo, Jose Ignacio
Fernández‐de‐Vega, Francisco
Amor, Margarita
Doallo, Ramón
Fraguela, Basilio B.
Herrero, José R.
Quintana‐Ortí, Enrique S.
Strzodka, Robert - Abstract:
- <abstract abstract-type="main" id="cpe2916-abs-0001"> <title>SUMMARY</title> <p>The use of high‐performance libraries for dense linear algebra operations is of great importance in many numerical scientific applications. The most common operations form the backbone of the Basic Linear Algebra Subroutines (BLAS) library. In this paper, we consider the performance and auto‐tuning of level 1 and level 2 BLAS routines on graphical processing units. As examples, we develop single‐precision Compute Unified Device Architecture kernels for three of the most popular operations, the Euclidian norm (SNRM2), the matrix–vector multiplication (SGEMV), and the triangular solution (STRSV). The target hardware is the most recent Nvidia (Santa Clara, CA, USA) Tesla 20‐series (Fermi architecture), which is designed from the ground up for high‐performance computing. We show that it is essentially a matter of fully utilizing the fine‐grained parallelism of the many‐core graphical processing unit to achieve high performance for level 1 and level 2 BLAS operations. We show that auto‐tuning can be successfully employed to kernels for these operations so that they perform well for all input sizes. Copyright © 2012 John Wiley & Sons, Ltd.</p> </abstract>
- Is Part Of:
- Concurrency and computation. Volume 25:Number 8(2013:Jun.)
- Journal:
- Concurrency and computation
- Issue:
- Volume 25:Number 8(2013:Jun.)
- Issue Display:
- Volume 25, Issue 8 (2013)
- Year:
- 2013
- Volume:
- 25
- Issue:
- 8
- Issue Sort Value:
- 2013-0025-0008-0000
- Page Start:
- 1183
- Page End:
- 1198
- Publication Date:
- 2012-09-25
- Subjects:
- Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.2916 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 4365.xml