Category- and selection-enabled nearest neighbor joins. (August 2017)
- Record Type:
- Journal Article
- Title:
- Category- and selection-enabled nearest neighbor joins. (August 2017)
- Main Title:
- Category- and selection-enabled nearest neighbor joins
- Authors:
- Cafagna, Francesco
Böhlen, Michael H.
Bracher, Annelies - Abstract:
- Abstract: This paper proposes a category- and selection-enabled nearest neighbor join (NNJ) between relationr and relations, with similarity on T and support for category attributesC and selection predicate θ . Our solution does not suffer from redundant fetches and index false hits, which are the main performance bottlenecks of current nearest neighbor join techniques. A category-enabled NNJ leverages the category attributesC for query evaluation. For example, the categories of relationr can be used to limit relations accessed at most once. Solutions that are not category-enabled must process each category independently and end up fetching, either from disk or memory, the blocks of the input relations multiple times. A selection-enabled NNJ performs well independent of whether the DBMS optimizer pushes the selection down or evaluates it on the fly. In contrast, index-based solutions suffer from many index false hits or end up in an expensive nested loop. Our solution does not constrain the physical design, and is efficient for row- as well as column-stores. Current solutions for column-stores use late materialization, which is only efficient if the data is clustered on the category attributesC . Our evaluation algorithm finds, for each outer tuple r, the inner tuples that satisfy the equality on the category and have the smallest distance to r with only one scan of both inputs. We experimentally evaluate our solution using a data warehouse that manages analyses of animalAbstract: This paper proposes a category- and selection-enabled nearest neighbor join (NNJ) between relationr and relations, with similarity on T and support for category attributesC and selection predicate θ . Our solution does not suffer from redundant fetches and index false hits, which are the main performance bottlenecks of current nearest neighbor join techniques. A category-enabled NNJ leverages the category attributesC for query evaluation. For example, the categories of relationr can be used to limit relations accessed at most once. Solutions that are not category-enabled must process each category independently and end up fetching, either from disk or memory, the blocks of the input relations multiple times. A selection-enabled NNJ performs well independent of whether the DBMS optimizer pushes the selection down or evaluates it on the fly. In contrast, index-based solutions suffer from many index false hits or end up in an expensive nested loop. Our solution does not constrain the physical design, and is efficient for row- as well as column-stores. Current solutions for column-stores use late materialization, which is only efficient if the data is clustered on the category attributesC . Our evaluation algorithm finds, for each outer tuple r, the inner tuples that satisfy the equality on the category and have the smallest distance to r with only one scan of both inputs. We experimentally evaluate our solution using a data warehouse that manages analyses of animal feeds. Abstract : Highlights: A category- and selection-enabled Nearest Neighbour Join (NNJ) operator. The optimizations that a category- and selection-enabled query tree efficiently uses. The integration of our query tree in row-store and column-store DBMSs. Experiments on disk, memory, and column-store DBMSs of the main NNJ solutions. … (more)
- Is Part Of:
- Information systems. Volume 68(2017)
- Journal:
- Information systems
- Issue:
- Volume 68(2017)
- Issue Display:
- Volume 68, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 68
- Issue:
- 2017
- Issue Sort Value:
- 2017-0068-2017-0000
- Page Start:
- 3
- Page End:
- 16
- Publication Date:
- 2017-08
- Subjects:
- Robust nearest neighbor join -- Similarity join -- Sort merge -- Query optimization -- Column-store -- PostgreSQL
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2017.01.006 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1071.xml