DWSpyder: a new schema extraction method for a deep web integration system. (3rd October 2019)
- Record Type:
- Journal Article
- Title:
- DWSpyder: a new schema extraction method for a deep web integration system. (3rd October 2019)
- Main Title:
- DWSpyder: a new schema extraction method for a deep web integration system
- Authors:
- Saissi, Yasser
Zellou, Ahmed
Adri, Ali - Abstract:
- The deep web is a huge part of the web that is not indexed by search engines. The deep web sources are accessible only through their associated access forms. We wish to use a web integration system to access the deep web sources and all of their information. To implement this web integration system, we need to know the schema description of each web source. The problem resolved in this paper is how to extract the schema describing an inaccessible deep web source. We propose our DWSpyder method as being able to extract the schema describing a deep web source despite its inaccessibility. The DWSpyder method starts with a static analysis of the deep web source access forms in order to extract the first elements of the associated schema description. The second step of our method is a dynamic analysis of these access forms using queries to enrich our schema description. Our DWSpyder method also uses a clustering algorithm to identify the possible values of deep web form fields with undefined sets of values. All of the information extracted is used by DWSpyder to generate automatically deep web source schema descriptions.
- Is Part Of:
- International journal of Web engineering and technology. Volume 14:Number 2(2019)
- Journal:
- International journal of Web engineering and technology
- Issue:
- Volume 14:Number 2(2019)
- Issue Display:
- Volume 14, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 14
- Issue:
- 2
- Issue Sort Value:
- 2019-0014-0002-0000
- Page Start:
- 122
- Page End:
- 150
- Publication Date:
- 2019-10-03
- Subjects:
- web integration -- schema extraction -- deep web -- clustering
World Wide Web -- Periodicals
Web site development -- Periodicals
Application software -- Development -- Periodicals
006.7 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijwet ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1476-1289
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 11473.xml