Similarity Joins: Their implementation and interactions with other database operators. (August 2015)
- Record Type:
- Journal Article
- Title:
- Similarity Joins: Their implementation and interactions with other database operators. (August 2015)
- Main Title:
- Similarity Joins: Their implementation and interactions with other database operators
- Authors:
- Silva, Yasin N.
Pearson, Spencer S.
Chon, Jaime
Roberts, Ryan - Abstract:
- Abstract: Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε . While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like ε, data size, and number of dimensions increase.
- Is Part Of:
- Information systems. Volume 52(2015)
- Journal:
- Information systems
- Issue:
- Volume 52(2015)
- Issue Display:
- Volume 52, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 52
- Issue:
- 2015
- Issue Sort Value:
- 2015-0052-2015-0000
- Page Start:
- 149
- Page End:
- 162
- Publication Date:
- 2015-08
- Subjects:
- Similarity Join -- Database operator -- Similarity queries -- PostgreSQL -- Query processing and optimization
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2015.01.008 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5681.xml