GPU implementation of a parallel two‐list algorithm for the subset‐sum problem. (9th January 2014)
- Record Type:
- Journal Article
- Title:
- GPU implementation of a parallel two‐list algorithm for the subset‐sum problem. (9th January 2014)
- Main Title:
- GPU implementation of a parallel two‐list algorithm for the subset‐sum problem
- Authors:
- Wan, Lanjun
Li, Kenli
Liu, Jing
Li, Keqin - Abstract:
- <abstract abstract-type="main" id="cpe3201-abs-0001"> <title>SUMMARY</title> <p id="cpe3201-para-0001">The subset‐sum problem is a well‐known non‐deterministic polynomial‐time complete (NP‐complete) decision problem. This paper proposes a novel and efficient implementation of a parallel <italic>two‐list</italic> algorithm for solving the problem on a graphics processing unit (GPU) using Compute Unified Device Architecture (CUDA). The algorithm is composed of a generation stage, a pruning stage, and a search stage. It is not easy to effectively implement the three stages of the algorithm on a GPU. Ways to achieve better performance, reasonable task distribution between CPU and GPU, effective GPU memory management, and CPU–GPU communication cost minimization are discussed. The generation stage of the algorithm adopts a typical recursive <italic>divide‐and‐conquer</italic> strategy. Because recursion cannot be well supported by current GPUs with compute capability less than 3.5, a new vector‐based iterative implementation mechanism is designed to replace the explicit recursion. Furthermore, to optimize the performance of the GPU implementation, this paper improves the three stages of the algorithm. The experimental results show that the GPU implementation has much better performance than the CPU implementation and can achieve high speedup on different GPU cards. The experimental results also illustrate that the improved algorithm can bring significant performance benefits for<abstract abstract-type="main" id="cpe3201-abs-0001"> <title>SUMMARY</title> <p id="cpe3201-para-0001">The subset‐sum problem is a well‐known non‐deterministic polynomial‐time complete (NP‐complete) decision problem. This paper proposes a novel and efficient implementation of a parallel <italic>two‐list</italic> algorithm for solving the problem on a graphics processing unit (GPU) using Compute Unified Device Architecture (CUDA). The algorithm is composed of a generation stage, a pruning stage, and a search stage. It is not easy to effectively implement the three stages of the algorithm on a GPU. Ways to achieve better performance, reasonable task distribution between CPU and GPU, effective GPU memory management, and CPU–GPU communication cost minimization are discussed. The generation stage of the algorithm adopts a typical recursive <italic>divide‐and‐conquer</italic> strategy. Because recursion cannot be well supported by current GPUs with compute capability less than 3.5, a new vector‐based iterative implementation mechanism is designed to replace the explicit recursion. Furthermore, to optimize the performance of the GPU implementation, this paper improves the three stages of the algorithm. The experimental results show that the GPU implementation has much better performance than the CPU implementation and can achieve high speedup on different GPU cards. The experimental results also illustrate that the improved algorithm can bring significant performance benefits for the GPU implementation. Copyright © 2014 John Wiley &amp; Sons, Ltd.</p> </abstract> … (more)
- Is Part Of:
- Concurrency and computation. Volume 27:Number 1(2015:Jan.)
- Journal:
- Concurrency and computation
- Issue:
- Volume 27:Number 1(2015:Jan.)
- Issue Display:
- Volume 27, Issue 1 (2015)
- Year:
- 2015
- Volume:
- 27
- Issue:
- 1
- Issue Sort Value:
- 2015-0027-0001-0000
- Page Start:
- 119
- Page End:
- 145
- Publication Date:
- 2014-01-09
- Subjects:
- Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.3201 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 3684.xml