Understanding web documents: finding pagelets for transformation using structural patterns. (16th July 2008)
- Record Type:
- Journal Article
- Title:
- Understanding web documents: finding pagelets for transformation using structural patterns. (16th July 2008)
- Main Title:
- Understanding web documents: finding pagelets for transformation using structural patterns
- Authors:
- Ferrydiansyah, Reza
Parmanto, Bambang - Abstract:
- Understanding a web document and the sections inside the document is very important for web transformation and information retrieval from web pages. Detecting pagelets, which are small features located inside a web page, in order to understand a web document's structure is a difficult problem. Current work on pagelet detection focuses only on finding the location of the pagelet without regard to its functionality. We describe a method to detect both the location and functionality of pagelets using HTML element patterns. For each pagelet type, an HTML element pattern is created and matched to a web page. Sections of the web page that matches the patterns are marked as pagelet candidates. We test this technique on multiple popular web pages from the news and e-commerce genres. We find that this method adequately recalls various pagelets from the web page.
- Is Part Of:
- International journal of Web engineering and technology. Volume 4:Number 2(2008)
- Journal:
- International journal of Web engineering and technology
- Issue:
- Volume 4:Number 2(2008)
- Issue Display:
- Volume 4, Issue 3 (2008)
- Year:
- 2008
- Volume:
- 4
- Issue:
- 3
- Issue Sort Value:
- 2008-0004-0003-0000
- Page Start:
- 313
- Page End:
- 335
- Publication Date:
- 2008-07-16
- Subjects:
- pagelet detection -- segmentation -- pattern matching -- annotation -- transcoding -- world wide web -- HTML element patterns -- web documents -- web transformation -- information retrieval -- document structure
World Wide Web -- Periodicals
Web site development -- Periodicals
Application software -- Development -- Periodicals
006.7 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijwet ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1476-1289
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 8922.xml