Tile/line access cache memory based on a multi‐level Z‐order tiling data layout. (17th December 2017)
- Record Type:
- Journal Article
- Title:
- Tile/line access cache memory based on a multi‐level Z‐order tiling data layout. (17th December 2017)
- Main Title:
- Tile/line access cache memory based on a multi‐level Z‐order tiling data layout
- Authors:
- Wang, Baokang
Fukazawa, Yuki
Kondo, Toshio
Sasaki, Takahiro - Other Names:
- Limet Sébastien guestEditor.
Merlo Alessio guestEditor.
Spalazzi Luca guestEditor. - Abstract:
- Summary: Ineffective column‐directional cache memory access has become a bottleneck for efficient two‐dimensional (2‐D) data processing utilizing extended single instruction multiple data (SIMD) instructions. To solve this problem, we propose a cache memory with tile (column and row directions) and line (row direction) accessibility for efficient 2‐D data processing. 2‐D data access to the proposed cache memory is enabled via a hardware‐based multi‐mode address translation unit that eliminates the overhead of software‐based address calculation. To reduce the hardware overhead of the proposed cache, we propose a tag memory reduction method that replaces multiple tiles with an aligned tile set (RATS) in the cache. To verify the feasibility of the proposed cache, an LSI layout of a SIMD‐based general purpose‐oriented datapath embedding the proposed cache is designed in a 2.5×5 mm 2 area using 0.18‐ μ m CMOS technology. Under a 3.9‐ns clock period (250 MHz), the read latency is limited to 3 clock cycles, which is the same as that for the conventional cache memory. Using the RATS method, the entire hardware overhead of the proposed cache is reduced to only 7% of that required for a conventional cache. In addition, simulation results for the proposed cache indicate a considerable reduction of L1 and L2 cache confliction misses compared with a conventional cache in power‐of‐two matrix size due to the column‐directional address stride being sufficiently smaller than page size.Summary: Ineffective column‐directional cache memory access has become a bottleneck for efficient two‐dimensional (2‐D) data processing utilizing extended single instruction multiple data (SIMD) instructions. To solve this problem, we propose a cache memory with tile (column and row directions) and line (row direction) accessibility for efficient 2‐D data processing. 2‐D data access to the proposed cache memory is enabled via a hardware‐based multi‐mode address translation unit that eliminates the overhead of software‐based address calculation. To reduce the hardware overhead of the proposed cache, we propose a tag memory reduction method that replaces multiple tiles with an aligned tile set (RATS) in the cache. To verify the feasibility of the proposed cache, an LSI layout of a SIMD‐based general purpose‐oriented datapath embedding the proposed cache is designed in a 2.5×5 mm 2 area using 0.18‐ μ m CMOS technology. Under a 3.9‐ns clock period (250 MHz), the read latency is limited to 3 clock cycles, which is the same as that for the conventional cache memory. Using the RATS method, the entire hardware overhead of the proposed cache is reduced to only 7% of that required for a conventional cache. In addition, simulation results for the proposed cache indicate a considerable reduction of L1 and L2 cache confliction misses compared with a conventional cache in power‐of‐two matrix size due to the column‐directional address stride being sufficiently smaller than page size. Therefore, the proposed cache provides efficient column‐directional parallel access as same as row‐directional parallel access so that it enables efficient SIMD operation requiring no transposition in matrix multiplication (MM). For LU decomposition (LUD), the proposed cache can provide almost the same performance to the column‐major–based LUD program as that to the row‐major–based LUD program. These results show that the proposed cache does not restrict our freedom in selecting either row‐ or column‐major order coding. … (more)
- Is Part Of:
- Concurrency and computation. Volume 30:Number 9(2018)
- Journal:
- Concurrency and computation
- Issue:
- Volume 30:Number 9(2018)
- Issue Display:
- Volume 30, Issue 9 (2018)
- Year:
- 2018
- Volume:
- 30
- Issue:
- 9
- Issue Sort Value:
- 2018-0030-0009-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2017-12-17
- Subjects:
- cache memory -- cache locality -- tile -- Z‐Morton layout -- 2‐D data processing
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.4375 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 9356.xml