A methodology for speeding up loop kernels by exploiting the software information and the memory architecture. (April 2015)
- Record Type:
- Journal Article
- Title:
- A methodology for speeding up loop kernels by exploiting the software information and the memory architecture. (April 2015)
- Main Title:
- A methodology for speeding up loop kernels by exploiting the software information and the memory architecture
- Authors:
- Kelefouras, Vasilios
Kritikakou, Angeliki
Goutis, Costas - Abstract:
- Abstract: It is well-known that today׳s compilers and state of the art libraries have three major drawbacks. First, the compiler sub-problems are optimized separately; this is not efficient because the separate sub-problems optimization gives a different schedule for each sub-problem and these schedules cannot coexist as the refining of one, causes the degradation of another. Second, they take into account only part of the specific algorithm׳s information. Third, they take into account only a few hardware architecture parameters. These approaches cannot give an optimal solution. In this paper, a new methodology/pre-compiler is introduced, which speeds up loop kernels, by overcoming the above problems. This methodology solves four of the major scheduling sub-problems, together as one problem and not separately; these are the sub-problems of finding the schedules with the minimum numbers of (i) L1 data cache accesses, (ii) L2 data cache accesses, (iii) main memory data accesses, (iv) addressing instructions. First, the exploration space (possible solutions) is found according to the algorithm׳s information, e.g. array subscripts. Then, the exploration space is decreased by orders of magnitude, by applying constraint propagation to the software and hardware parameters. We take the C-code and the memory architecture parameters as input and we automatically produce a new faster C-code; this code cannot be obtained by applying the existing compiler transformations to the originalAbstract: It is well-known that today׳s compilers and state of the art libraries have three major drawbacks. First, the compiler sub-problems are optimized separately; this is not efficient because the separate sub-problems optimization gives a different schedule for each sub-problem and these schedules cannot coexist as the refining of one, causes the degradation of another. Second, they take into account only part of the specific algorithm׳s information. Third, they take into account only a few hardware architecture parameters. These approaches cannot give an optimal solution. In this paper, a new methodology/pre-compiler is introduced, which speeds up loop kernels, by overcoming the above problems. This methodology solves four of the major scheduling sub-problems, together as one problem and not separately; these are the sub-problems of finding the schedules with the minimum numbers of (i) L1 data cache accesses, (ii) L2 data cache accesses, (iii) main memory data accesses, (iv) addressing instructions. First, the exploration space (possible solutions) is found according to the algorithm׳s information, e.g. array subscripts. Then, the exploration space is decreased by orders of magnitude, by applying constraint propagation to the software and hardware parameters. We take the C-code and the memory architecture parameters as input and we automatically produce a new faster C-code; this code cannot be obtained by applying the existing compiler transformations to the original code. The proposed methodology has been evaluated for five well-known algorithms in both general and embedded processors; it is compared with gcc and clang compilers and also with iterative compilation. Abstract : Highlights: For the first time, the software optimization problem is addressed theoretically. Exploit the software structure and the hardware architecture parameters. Solve the major scheduling subproblems together as one problem and not separately. … (more)
- Is Part Of:
- Computer languages, systems & structures. Volume 41(2015)
- Journal:
- Computer languages, systems & structures
- Issue:
- Volume 41(2015)
- Issue Display:
- Volume 41, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 41
- Issue:
- 2015
- Issue Sort Value:
- 2015-0041-2015-0000
- Page Start:
- 21
- Page End:
- 41
- Publication Date:
- 2015-04
- Subjects:
- Data reuse -- Register allocation -- Optimization -- Memory hierarchy -- Loop tiling -- Data locality -- Diophantine equations
Programming languages (Electronic computers) -- Periodicals
Computer networks -- Periodicals
Computer architecture -- Periodicals
Computer systems -- Periodicals
Langage de programmation
Réseau d'ordinateurs
Architecture d'ordinateur
Périodique électronique (Descripteur de forme)
Ressource Internet (Descripteur de forme)
005.13 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14778424/40 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cl.2015.01.003 ↗
- Languages:
- English
- ISSNs:
- 1477-8424
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.071000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 4832.xml