Mining the web with hierarchical crawlers – a resource sharing based crawling approach. (9th February 2009)
- Record Type:
- Journal Article
- Title:
- Mining the web with hierarchical crawlers – a resource sharing based crawling approach. (9th February 2009)
- Main Title:
- Mining the web with hierarchical crawlers – a resource sharing based crawling approach
- Authors:
- Kundu, Anirban
Dutta, Ruma
Dattagupta, Rana
Mukhopadhyay, Debajyoti - Abstract:
- An important component of any web search engine is its crawler, which is also known as robot or spider. An efficient set of crawlers make any search engine more powerful, apart from its other measures of performance, such as its ranking algorithm, storage mechanism, indexing techniques, etc. In this paper, we have proposed an extended technique for crawling over the World Wide Web (WWW) on behalf of a search engine. This is an approach with multiple crawlers working in parallel combined with the mechanism of focused crawling (Chakrabarti et al., 1999a, 2002; Mukhopadhyay et al., 2006). In this approach, the total structure of any website is divided into several number of levels based on the hyperlink-structure for downloading web pages from that website (Chakrabarti et al., 1999b; Mukhopadhyay and Singh, 2004). The number of crawlers of each level is not fixed, rather dynamic in this context. It is determined at execution time on demand basis using threaded program based on the number of hyperlinks of a specific web page. This paper also proposes a focused hierarchical crawling technique, where crawlers are created dynamically at runtime for different domains to crawl the web pages with the essence of resource sharing.
- Is Part Of:
- International journal of intelligent information and database systems. Volume 3:Number 1(2009)
- Journal:
- International journal of intelligent information and database systems
- Issue:
- Volume 3:Number 1(2009)
- Issue Display:
- Volume 3, Issue 1 (2009)
- Year:
- 2009
- Volume:
- 3
- Issue:
- 1
- Issue Sort Value:
- 2009-0003-0001-0000
- Page Start:
- 90
- Page End:
- 106
- Publication Date:
- 2009-02-09
- Subjects:
- seed queues -- single crawlers -- parallel crawlers -- hierarchical crawlers -- focused crawlers -- domain specific crawlers -- resource sharing -- web mining -- web search engines -- world wide web -- hyperlinks -- web page crawling
Database management -- Computer programs -- Periodicals
Information retrieval -- Computer programs -- Periodicals
Information storage and retrieval systems -- Computer programs -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Intelligent agents (Computer software) -- Periodicals
006.33 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijiids ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1751-5858
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 8684.xml