An efficient similarity-based approach for comparing XML documents. (November 2018)
- Record Type:
- Journal Article
- Title:
- An efficient similarity-based approach for comparing XML documents. (November 2018)
- Main Title:
- An efficient similarity-based approach for comparing XML documents
- Authors:
- Oliveira, Alessandreia
Tessarolli, Gabriel
Ghiotto, Gleiph
Pinto, Bruno
Campello, Fernando
Marques, Matheus
Oliveira, Carlos
Rodrigues, Igor
Kalinowski, Marcos
Souza, Uéverton
Murta, Leonardo
Braganholo, Vanessa - Abstract:
- Highlights: Similarity-based comparison of XML document revisions. Polinominal asymptotic complexity. Speed up of almost 45 times if compared to state-of-the-art algorithms. Superior Efficiency if compared to state-of-the-art algorithms. Equivalent Efficacy of state-of-the-art algorithms. Abstract: XML documents are widely used to interchange information among heterogeneous systems, ranging from office applications to scientific experiments. Independently of the domain, XML documents may evolve, so identifying and understanding the changes they undergo becomes crucial. Some syntactic diff approaches have been proposed to address this problem. They are mainly designed to compare revisions of XML documents using explicit IDs to match elements. However, elements in different revisions may not share IDs due to tool incompatibility or even divergent or missing schemas. In this paper, we present Phoenix, a similarity-based approach for comparing revisions of XML documents that does not rely on explicit IDs. Phoenix uses dynamic programming and optimization algorithms to compare different features (e.g., element name, content, attributes, and sub-elements) of XML documents and calculate the similarity degree between them. We compared Phoenix with X-Diff and XyDiff, two state-of-the-art XML diff algorithms. XyDiff was the fastest approach but failed in providing precise matching results. X-Diff presented higher efficacy in 30 of the 56 scenarios but was slow. Phoenix executed in aHighlights: Similarity-based comparison of XML document revisions. Polinominal asymptotic complexity. Speed up of almost 45 times if compared to state-of-the-art algorithms. Superior Efficiency if compared to state-of-the-art algorithms. Equivalent Efficacy of state-of-the-art algorithms. Abstract: XML documents are widely used to interchange information among heterogeneous systems, ranging from office applications to scientific experiments. Independently of the domain, XML documents may evolve, so identifying and understanding the changes they undergo becomes crucial. Some syntactic diff approaches have been proposed to address this problem. They are mainly designed to compare revisions of XML documents using explicit IDs to match elements. However, elements in different revisions may not share IDs due to tool incompatibility or even divergent or missing schemas. In this paper, we present Phoenix, a similarity-based approach for comparing revisions of XML documents that does not rely on explicit IDs. Phoenix uses dynamic programming and optimization algorithms to compare different features (e.g., element name, content, attributes, and sub-elements) of XML documents and calculate the similarity degree between them. We compared Phoenix with X-Diff and XyDiff, two state-of-the-art XML diff algorithms. XyDiff was the fastest approach but failed in providing precise matching results. X-Diff presented higher efficacy in 30 of the 56 scenarios but was slow. Phoenix executed in a fraction of the running time required by X-Diff and achieved the best results in terms of efficacy in 26 of 56 tested scenarios. In our evaluations, Phoenix was by far the most efficient approach to match elements across revisions of the same XML document. … (more)
- Is Part Of:
- Information systems. Volume 78(2018)
- Journal:
- Information systems
- Issue:
- Volume 78(2018)
- Issue Display:
- Volume 78, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 78
- Issue:
- 2018
- Issue Sort Value:
- 2018-0078-2018-0000
- Page Start:
- 40
- Page End:
- 57
- Publication Date:
- 2018-11
- Subjects:
- XML -- Diff -- Match -- Similarity
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2018.07.001 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14528.xml