Graph integration of structured, semistructured and unstructured data for data journalism. Issue 104 (February 2022)
- Record Type:
- Journal Article
- Title:
- Graph integration of structured, semistructured and unstructured data for data journalism. Issue 104 (February 2022)
- Main Title:
- Graph integration of structured, semistructured and unstructured data for data journalism
- Authors:
- Anadiotis, Angelos Christos
Balalau, Oana
Conceição, Catarina
Galhardas, Helena
Haddad, Mhd Yamen
Manolescu, Ioana
Merabti, Tayeb
You, Jingmao - Abstract:
- Abstract: Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to define and deploy custom extract-transform-load workflows, especially for dynamically varying sets of data sources. We describe a complete approach for integrating dynamic sets of heterogeneous datasets along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments. Highlights: We define novel integration graphs and we construct them from arbitrary datasets We build the graphs leveraging data integration, information extraction, and data management We propose a novel algorithm finding matches across heterogeneous data sources We implement our approach on text, CSV, JSON, XML, RDF, PDF and relational datasets We evaluate our approach using a set of use cases with real journalistic datasets
- Is Part Of:
- Information systems. Issue 104(2022)
- Journal:
- Information systems
- Issue:
- Issue 104(2022)
- Issue Display:
- Volume 104, Issue 104 (2022)
- Year:
- 2022
- Volume:
- 104
- Issue:
- 104
- Issue Sort Value:
- 2022-0104-0104-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-02
- Subjects:
- Data journalism -- Heterogeneous data integration -- Information extraction
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2021.101846 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20100.xml