Improving web information indexing and retrieval based on center block duplication detection. (21st July 2008)
- Record Type:
- Journal Article
- Title:
- Improving web information indexing and retrieval based on center block duplication detection. (21st July 2008)
- Main Title:
- Improving web information indexing and retrieval based on center block duplication detection
- Authors:
- Cadenhead, Tyrone
Chen, Jinlin
Cook, Terry - Abstract:
- Duplicated information in today's Web has serious negative impact to Web search engines in that it increases the size of the index and results in low efficiency for Web information retrieval. One important fact is that a large amount of Web content duplication happens at block level in addition to site and page level due to various reasons. Besides, when searching through the Web, in most cases the desired information is located at the center block of a relevant page. Based on these two observations, we propose an efficient block level duplication detection algorithm based on resemblance transitivity, and index center blocks instead of entire Web pages for Web information retrieval. Experiments show that these strategies can effectively reduce index size and index construction time without sacrificing the effectiveness of Web information retrieval.
- Is Part Of:
- International journal of innovative computing and applications. Volume 1:Number 3(2008)
- Journal:
- International journal of innovative computing and applications
- Issue:
- Volume 1:Number 3(2008)
- Issue Display:
- Volume 1, Issue 3 (2008)
- Year:
- 2008
- Volume:
- 1
- Issue:
- 3
- Issue Sort Value:
- 2008-0001-0003-0000
- Page Start:
- 194
- Page End:
- 204
- Publication Date:
- 2008-07-21
- Subjects:
- duplication detection -- inverted index -- layout structure detection -- information retrieval -- web information -- information indexing -- internet -- center block -- resemblance transitivity
Evolutionary computation -- Periodicals
Neural networks (Computer science) -- Periodicals
Genetic programming (Computer science) -- Periodicals
Biologically-inspired computing -- Periodicals
Swarm intelligence -- Periodicals
Quantum computers -- Periodicals
006.3 - Journal URLs:
- http://www.inderscience.com/browse/index.php?journalCODE=ijica ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1751-648X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 8679.xml