Parallelizing filter-and-verification based exact set similarity joins on multicores. Issue 108 (September 2022)
- Record Type:
- Journal Article
- Title:
- Parallelizing filter-and-verification based exact set similarity joins on multicores. Issue 108 (September 2022)
- Main Title:
- Parallelizing filter-and-verification based exact set similarity joins on multicores
- Authors:
- Fier, Fabian
Freytag, Johann-Christoph - Abstract:
- Abstract: Set similarity join (SSJ) is a well studied problem with many algorithms proposed to speed up its performance. However, its scalability and performance are rarely discussed in modern multicore environments. Existing algorithms assume a single-threaded execution that leaves the abundant parallelism provided by modern machines unused, or use distributed setups that may not yield efficient runtimes and speedups that are proportional to the amount of hardware resources (e.g., CPU cores). In this paper, we focus on a widely-used family of SSJ algorithms that are based on the filter-and-verification paradigm, and study the potential of speeding them up in the context of multicore machines. We adapt state-of-the-art SSJ algorithms including PPJoin and AllPairs. Our experiments using 12 real-world datasets highlight important findings: (1) Using the exact number of hardware-provided hyperthreads leads to optimal runtimes for most experiments, (2) hand-crafted data structures do not always lead to better performance, and (3) PPJoin's position filter is more effective in the multithreaded case compared to the single-threaded execution. Highlights: Multi-threading has not yet been considered to speed up set similarity joins. We propose a novel data-parallel set similarity join algorithm. Multi-threading speeds up the set similarity join 2 to 10 times. Implementation optimizations are not benefitial for the runtime.
- Is Part Of:
- Information systems. Issue 108(2022)
- Journal:
- Information systems
- Issue:
- Issue 108(2022)
- Issue Display:
- Volume 108, Issue 108 (2022)
- Year:
- 2022
- Volume:
- 108
- Issue:
- 108
- Issue Sort Value:
- 2022-0108-0108-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Set similarity join -- Parallelization -- Multi-threading -- Multi-core -- Filter-and-verification
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2021.101912 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21544.xml