A grammar-based approach for XML schema extraction and heterogeneous document integration. (25th March 2019)
- Record Type:
- Journal Article
- Title:
- A grammar-based approach for XML schema extraction and heterogeneous document integration. (25th March 2019)
- Main Title:
- A grammar-based approach for XML schema extraction and heterogeneous document integration
- Authors:
- Janga, Prudhvi
Davis, Karen C. - Abstract:
- The availability of vast amounts of heterogeneous XML web data motivates finding efficient methods to search, integrate, query, and present this data. The structure of XML documents is useful for achieving these tasks; however, not every XML document on the web includes a schema. We discuss challenges and solutions in the area of generation and integration of XML schemas. We propose and implement a framework for efficient schema extraction and integration from heterogeneous XML document collections collected from the web. Our approach introduces the schema extended context-free grammar (SECFG) to model XML schemas, including detection of attributes, data types, and element occurrences. Unlike other implementations, our approach supports the generation of XML schemas in any XML schema language, e.g., DTD or XSD. We compare our approach with other proposed approaches and conclude that we offer the same or better functionality more efficiently and with greater flexibility. The approach we propose is flexible enough to facilitate integration of and translation to tabular (relational) data.
- Is Part Of:
- International journal of data mining, modelling and management. Volume 11:Number 3(2019)
- Journal:
- International journal of data mining, modelling and management
- Issue:
- Volume 11:Number 3(2019)
- Issue Display:
- Volume 11, Issue 3 (2019)
- Year:
- 2019
- Volume:
- 11
- Issue:
- 3
- Issue Sort Value:
- 2019-0011-0003-0000
- Page Start:
- 235
- Page End:
- 258
- Publication Date:
- 2019-03-25
- Subjects:
- XML schema -- schema integration -- schema extraction -- schema discovery
Data mining -- Periodicals
Information science -- Periodicals
Databases -- Periodicals
005.7 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijdmmm ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1759-1163
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 10869.xml