Top-k-size keyword search on tree structured data. (January 2015)
- Record Type:
- Journal Article
- Title:
- Top-k-size keyword search on tree structured data. (January 2015)
- Main Title:
- Top-k-size keyword search on tree structured data
- Authors:
- Dimitriou, Aggeliki
Theodoratos, Dimitri
Sellis, Timos - Abstract:
- Abstract: Keyword search is the most popular technique for querying large tree-structured datasets, often of unknown structure, in the web. Recent keyword search approaches return lowest common ancestors (LCAs) of the keyword matches ranked with respect to their relevance to the keyword query. A major challenge of a ranking approach is the efficiency of its algorithms as the number of keywords and the size and complexity of the data increase. To face this challenge most of the known approaches restrict their ranking to a subset of the LCAs (e.g., SLCAs, ELCAs), missing relevant results. In this work, we design novel top-k-size stack-based algorithms on tree-structured data. Our algorithms implement ranking semantics for keyword queries which is based on the concept of LCA size. Similar to metric selection in information retrieval, LCA size reflects the proximity of keyword matches in the data tree. This semantics does not rank a predefined subset of LCAs and through a layered presentation of results, it demonstrates improved effectiveness compared to previous relevant approaches. To address performance challenges our algorithms exploit a lattice of the partitions of the keyword set, which empowers a linear time performance. This result is obtained without the support of auxiliary precomputed data structures. An extensive experimental study on various and large datasets confirms the theoretical analysis. The results show that, in contrast to other approaches, our algorithmsAbstract: Keyword search is the most popular technique for querying large tree-structured datasets, often of unknown structure, in the web. Recent keyword search approaches return lowest common ancestors (LCAs) of the keyword matches ranked with respect to their relevance to the keyword query. A major challenge of a ranking approach is the efficiency of its algorithms as the number of keywords and the size and complexity of the data increase. To face this challenge most of the known approaches restrict their ranking to a subset of the LCAs (e.g., SLCAs, ELCAs), missing relevant results. In this work, we design novel top-k-size stack-based algorithms on tree-structured data. Our algorithms implement ranking semantics for keyword queries which is based on the concept of LCA size. Similar to metric selection in information retrieval, LCA size reflects the proximity of keyword matches in the data tree. This semantics does not rank a predefined subset of LCAs and through a layered presentation of results, it demonstrates improved effectiveness compared to previous relevant approaches. To address performance challenges our algorithms exploit a lattice of the partitions of the keyword set, which empowers a linear time performance. This result is obtained without the support of auxiliary precomputed data structures. An extensive experimental study on various and large datasets confirms the theoretical analysis. The results show that, in contrast to other approaches, our algorithms scale smoothly when the size of the dataset and the number of keywords increase. Abstract : Highlights: Top-k-size keyword search over tree-structured data. Efficient multi-stack threshold and top-k algorithms exploiting a lattice of keyword partitions. The algorithms demonstrate linear time complexity and scale smoothly when the size of the dataset increases. The approach does not rely on auxiliary indices and can be used in streaming applications. Novel layered ranking semantics based on the concept of LCA size. … (more)
- Is Part Of:
- Information systems. Volume 47(2015)
- Journal:
- Information systems
- Issue:
- Volume 47(2015)
- Issue Display:
- Volume 47, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 47
- Issue:
- 2015
- Issue Sort Value:
- 2015-0047-2015-0000
- Page Start:
- 178
- Page End:
- 193
- Publication Date:
- 2015-01
- Subjects:
- Keyword search -- Tree-structured data -- LCA -- Search algorithm -- Ranking
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2014.07.002 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5747.xml