Re-ranking via local embeddings: A use case with permutation-based indexing and the nSimplex projection. Issue 95 (January 2021)
- Record Type:
- Journal Article
- Title:
- Re-ranking via local embeddings: A use case with permutation-based indexing and the nSimplex projection. Issue 95 (January 2021)
- Main Title:
- Re-ranking via local embeddings: A use case with permutation-based indexing and the nSimplex projection
- Authors:
- Vadicamo, Lucia
Gennaro, Claudio
Falchi, Fabrizio
Chávez, Edgar
Connor, Richard
Amato, Giuseppe - Abstract:
- Abstract: Approximate Nearest Neighbor (ANN) search is a prevalent paradigm for searching intrinsically high dimensional objects in large-scale data sets. Recently, the permutation-based approach for ANN has attracted a lot of interest due to its versatility in being used in the more general class of metric spaces. In this approach, the entire database is ranked by a permutation distance to the query. Typically, permutations allow the efficient selection of a candidate set of results, but typically to achieve high recall or precision this set has to be reviewed using the original metric and data. This can lead to a sizeable percentage of the database being recalled, along with many expensive distance calculations. To reduce the number of metric computations and the number of database elements accessed, we propose here a re-ranking based on a local embedding using the nSimplex projection. The nSimplex projection produces Euclidean vectors from objects in metric spaces which possess the n-point property. The mapping is obtained from the distances to a set of reference objects, and the original metric can be lower bounded and upper bounded by the Euclidean distance of objects sharing the same set of references. Our approach is particularly advantageous for extensive databases or expensive metric function. We reuse the distances computed in the permutations in the first stage, and hence the memory footprint of the index is not increased. An extensive experimental evaluation ofAbstract: Approximate Nearest Neighbor (ANN) search is a prevalent paradigm for searching intrinsically high dimensional objects in large-scale data sets. Recently, the permutation-based approach for ANN has attracted a lot of interest due to its versatility in being used in the more general class of metric spaces. In this approach, the entire database is ranked by a permutation distance to the query. Typically, permutations allow the efficient selection of a candidate set of results, but typically to achieve high recall or precision this set has to be reviewed using the original metric and data. This can lead to a sizeable percentage of the database being recalled, along with many expensive distance calculations. To reduce the number of metric computations and the number of database elements accessed, we propose here a re-ranking based on a local embedding using the nSimplex projection. The nSimplex projection produces Euclidean vectors from objects in metric spaces which possess the n-point property. The mapping is obtained from the distances to a set of reference objects, and the original metric can be lower bounded and upper bounded by the Euclidean distance of objects sharing the same set of references. Our approach is particularly advantageous for extensive databases or expensive metric function. We reuse the distances computed in the permutations in the first stage, and hence the memory footprint of the index is not increased. An extensive experimental evaluation of our approach is presented, demonstrating excellent results even on a set of hundreds of millions of objects. Highlights: Novel methods for refining results of permutation-based Nearest Neighbors search. The proposed methods exploit pivot-based data embeddings. The original data set is not accessed at query time as done by other refining methods. Extensive experimental evaluation showing great improvements in the effectiveness. … (more)
- Is Part Of:
- Information systems. Issue 95(2021)
- Journal:
- Information systems
- Issue:
- Issue 95(2021)
- Issue Display:
- Volume 95, Issue 95 (2021)
- Year:
- 2021
- Volume:
- 95
- Issue:
- 95
- Issue Sort Value:
- 2021-0095-0095-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-01
- Subjects:
- Metric search -- Permutation-based indexing -- n-point property -- nSimplex projection -- Metric local embeddings -- Distance bounds
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2020.101506 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14656.xml