Schema-independent querying for heterogeneous collections in NoSQL document stores. (November 2019)
- Record Type:
- Journal Article
- Title:
- Schema-independent querying for heterogeneous collections in NoSQL document stores. (November 2019)
- Main Title:
- Schema-independent querying for heterogeneous collections in NoSQL document stores
- Authors:
- Ben Hamadou, Hamdi
Ghozzi, Faiza
Péninou, André
Teste, Olivier - Abstract:
- Abstract: NoSQL document stores are well-tailored to efficiently load and manage massive collections of heterogeneous documents without any prior structural validation. However, this flexibility becomes a serious challenge when querying heterogeneous documents, and hence the user has to build complex queries or reformulate existing queries whenever new schemas are introduced in a collection. In this paper we propose a novel approach, based on formal foundations, for building schema-independent queries which are designed to query multi-structured documents. We present a query enrichment mechanism that consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project, unnest, aggregate and lookup. We then produce queries across multi-structured documents which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conducted experiments on synthetic datasets. Our results show that the induced overhead can be acceptable when compared to the efforts needed to restructure the data or the time required to execute several queries corresponding to the different schemas inside the collection. Highlights: Document stores offer the flexibility to store documents with heterogeneous schemas. Querying documentAbstract: NoSQL document stores are well-tailored to efficiently load and manage massive collections of heterogeneous documents without any prior structural validation. However, this flexibility becomes a serious challenge when querying heterogeneous documents, and hence the user has to build complex queries or reformulate existing queries whenever new schemas are introduced in a collection. In this paper we propose a novel approach, based on formal foundations, for building schema-independent queries which are designed to query multi-structured documents. We present a query enrichment mechanism that consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project, unnest, aggregate and lookup. We then produce queries across multi-structured documents which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conducted experiments on synthetic datasets. Our results show that the induced overhead can be acceptable when compared to the efforts needed to restructure the data or the time required to execute several queries corresponding to the different schemas inside the collection. Highlights: Document stores offer the flexibility to store documents with heterogeneous schemas. Querying document stores requires complex queries to overcome schemas heterogeneity. We propose to build queries over partial paths, regardless of documents schemas. We formally define the reformulation of such queries for most document operators. Queries are extended using a dictionary to bind paths to all existing absolute paths. … (more)
- Is Part Of:
- Information systems. Volume 85(2019)
- Journal:
- Information systems
- Issue:
- Volume 85(2019)
- Issue Display:
- Volume 85, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 85
- Issue:
- 2019
- Issue Sort Value:
- 2019-0085-2019-0000
- Page Start:
- 48
- Page End:
- 67
- Publication Date:
- 2019-11
- Subjects:
- Information systems -- Document stores -- Structural heterogeneity -- Schema-independent querying
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2019.04.005 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11052.xml