Data variety, come as you are in multi-model data warehouses. Issue 104 (February 2022)
- Record Type:
- Journal Article
- Title:
- Data variety, come as you are in multi-model data warehouses. Issue 104 (February 2022)
- Main Title:
- Data variety, come as you are in multi-model data warehouses
- Authors:
- Bimonte, Sandro
Gallinucci, Enrico
Marcel, Patrick
Rizzi, Stefano - Abstract:
- Abstract: Multi-model DBMSs (MMDBMSs) have been recently introduced to store and seamlessly query heterogeneous data (structured, semi-structured, graph-based, etc.) in their native form, aimed at effectively preserving their variety. Unfortunately, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational DBMSs for storage and querying, thus constraining data variety into the rigidity of a structured, fixed schema. In this paper, we investigate the performances of an MMDBMS when used to store multidimensional data for OLAP analyses. A multi-model DW would store each of its elements according to its native model; among the benefits we envision for this solution, that of bridging the architectural gap between data lakes and DWs, that of reducing the cost for ETL, and that of ensuring better flexibility, extensibility, and evolvability thanks to the combined use of structured and schemaless data. To support our investigation we define a multidimensional schema for the UniBench benchmark dataset and an ad-hoc OLAP workload for it. Then we propose and compare three logical solutions implemented on the PostgreSQL multi-model DBMS: one that extends a star schema with JSON, XML, graph-based, and key–value data; one based on a classical (fully relational) star schema; and one where all data are kept in their native form (no relational data are introduced). As expected, the full-relational implementationAbstract: Multi-model DBMSs (MMDBMSs) have been recently introduced to store and seamlessly query heterogeneous data (structured, semi-structured, graph-based, etc.) in their native form, aimed at effectively preserving their variety. Unfortunately, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational DBMSs for storage and querying, thus constraining data variety into the rigidity of a structured, fixed schema. In this paper, we investigate the performances of an MMDBMS when used to store multidimensional data for OLAP analyses. A multi-model DW would store each of its elements according to its native model; among the benefits we envision for this solution, that of bridging the architectural gap between data lakes and DWs, that of reducing the cost for ETL, and that of ensuring better flexibility, extensibility, and evolvability thanks to the combined use of structured and schemaless data. To support our investigation we define a multidimensional schema for the UniBench benchmark dataset and an ad-hoc OLAP workload for it. Then we propose and compare three logical solutions implemented on the PostgreSQL multi-model DBMS: one that extends a star schema with JSON, XML, graph-based, and key–value data; one based on a classical (fully relational) star schema; and one where all data are kept in their native form (no relational data are introduced). As expected, the full-relational implementation generally performs better than the multi-model one, but this is balanced by the benefits of MMDBMSs in dealing with variety. Finally, we give our perspective view of the research on this topic. Highlights: We investigate the performances of a multi-model DBMS to store multidimensional data for OLAP analyses. We define a multidimensional schema for the UniBench benchmark dataset and an ad-hoc OLAP workload for it. We propose and quantitatively compare three logical solutions implemented on the PostgreSQL multi-model DBMS. The querying performances of a multi-model solution are slightly worse than those of a full-relational solution. A multi-model solutions brings advantages in terms of extendibility, flexibility, evolvability, ETL. … (more)
- Is Part Of:
- Information systems. Issue 104(2022)
- Journal:
- Information systems
- Issue:
- Issue 104(2022)
- Issue Display:
- Volume 104, Issue 104 (2022)
- Year:
- 2022
- Volume:
- 104
- Issue:
- 104
- Issue Sort Value:
- 2022-0104-0104-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-02
- Subjects:
- OLAP -- Multi-model databases -- Data variety -- Data warehouse
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2021.101734 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20100.xml