Ultra-fast and efficient implementation schemes of complex matrix multiplication algorithm for VLIW architectures. (September 2022)
- Record Type:
- Journal Article
- Title:
- Ultra-fast and efficient implementation schemes of complex matrix multiplication algorithm for VLIW architectures. (September 2022)
- Main Title:
- Ultra-fast and efficient implementation schemes of complex matrix multiplication algorithm for VLIW architectures
- Authors:
- Najoui, Mohamed
Bahtat, Mounir
Klilou, Abdessamad
Hatim, Anas
Belkouch, Said
Jbari, Atman
Chabini, Noureddine - Abstract:
- Highlights: Design a fast-parallel low-level kernel of the Complex Matrix Multiplication algorithm based on modulo-scheduling, software pipelining and loop unrolling techniques. Suggest a novel approach of implementing the Complex Matrix Multiplication algorithm based on the fast-parallel kernel and the miss-pipelining technique. Introduce an ultra-optimized parallel implementation approach based on the fast-parallel kernel and the internal direct memory access data transfer technique. Accelerate the beamforming and Doppler Filter Bank algorithms to meet tight real-time constraints of radar applications. Abstract: The Complex Matrix Multiplication (CMM) algorithm is known to require a high computing performance and presenting exceptional challenges in real-life applications. Recent advances in Very Long Instruction Word (VLIW) based Digital Signal Processors (DSP) demonstrated high computing capabilities with a very low power consumption. In this work, we propose three ultra-fast, parallel and efficient VLIW implementation approaches of the CMM algorithm which could be used to meet tighter real-time constraints of several signal and image processing applications like radars. A novel parallel kernel, task mapping strategy and low-level optimization techniques are suggested, to fit a set of modern VLIW architectures. Additionally, an original memory access management technique was adopted to accelerate the algorithm by avoiding cache misses and bank conflicts. The experimentalHighlights: Design a fast-parallel low-level kernel of the Complex Matrix Multiplication algorithm based on modulo-scheduling, software pipelining and loop unrolling techniques. Suggest a novel approach of implementing the Complex Matrix Multiplication algorithm based on the fast-parallel kernel and the miss-pipelining technique. Introduce an ultra-optimized parallel implementation approach based on the fast-parallel kernel and the internal direct memory access data transfer technique. Accelerate the beamforming and Doppler Filter Bank algorithms to meet tight real-time constraints of radar applications. Abstract: The Complex Matrix Multiplication (CMM) algorithm is known to require a high computing performance and presenting exceptional challenges in real-life applications. Recent advances in Very Long Instruction Word (VLIW) based Digital Signal Processors (DSP) demonstrated high computing capabilities with a very low power consumption. In this work, we propose three ultra-fast, parallel and efficient VLIW implementation approaches of the CMM algorithm which could be used to meet tighter real-time constraints of several signal and image processing applications like radars. A novel parallel kernel, task mapping strategy and low-level optimization techniques are suggested, to fit a set of modern VLIW architectures. Additionally, an original memory access management technique was adopted to accelerate the algorithm by avoiding cache misses and bank conflicts. The experimental results showed the effectiveness of the proposed approaches where a peak performance of 15.89 GFLOPS was achieved on one C66x DSP core with a core utilization of 99% and a speedup of about 1.61, 3 and 10 compared to the state-of-the-art, the most optimized vendor and the conventional approaches, respectively. Graphical abstract: Image, graphical abstract … (more)
- Is Part Of:
- Computers & electrical engineering. Volume 102(2022)
- Journal:
- Computers & electrical engineering
- Issue:
- Volume 102(2022)
- Issue Display:
- Volume 102, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 102
- Issue:
- 2022
- Issue Sort Value:
- 2022-0102-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Complex Matrix Multiplication -- Signal and Image processing -- Radars -- Parallel implementation -- VLIW, DSP
Computer engineering -- Periodicals
Electrical engineering -- Periodicals
Electrical engineering -- Data processing -- Periodicals
Ordinateurs -- Conception et construction -- Périodiques
Électrotechnique -- Périodiques
Électrotechnique -- Informatique -- Périodiques
Computer engineering
Electrical engineering
Electrical engineering -- Data processing
Periodicals
Electronic journals
621.302854 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00457906/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compeleceng.2022.108294 ↗
- Languages:
- English
- ISSNs:
- 0045-7906
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.680000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23282.xml