Categorisation of web documents using extraction ontologies. (12th November 2008)
- Record Type:
- Journal Article
- Title:
- Categorisation of web documents using extraction ontologies. (12th November 2008)
- Main Title:
- Categorisation of web documents using extraction ontologies
- Authors:
- Xu, Li
Embley, David W. - Abstract:
- Automatically recognising which HTML documents on the Web contain items of interest for a user is non-trivial. As a step toward solving this problem, we propose an approach based on information-extraction ontologies. Given HTML documents, tables, and forms, our document recognition system extracts expected ontological vocabulary (keywords and keyword phrases) and expected ontological instance data (particular values for ontological concepts). We then use machine-learned rules over this extracted information to determine whether an HTML document contains items of interest. Experimental results show that our ontological approach to categorisation works well, having achieved F-measures above 90% for all applications we tried.
- Is Part Of:
- International journal of metadata, semantics and ontologies. Volume 3:Number 1(2008)
- Journal:
- International journal of metadata, semantics and ontologies
- Issue:
- Volume 3:Number 1(2008)
- Issue Display:
- Volume 3, Issue 1 (2008)
- Year:
- 2008
- Volume:
- 3
- Issue:
- 1
- Issue Sort Value:
- 2008-0003-0001-0000
- Page Start:
- 3
- Page End:
- 20
- Publication Date:
- 2008-11-12
- Subjects:
- document categorisation -- web documents -- document classification -- extraction ontologies -- HTML documents -- information extraction -- machine learning -- internet -- information retrieval
Metadata -- Periodicals
Semantic Web -- Periodicals
Ontologies (Information retrieval) -- Periodicals
Data structures (Computer science) -- Periodicals
Information theory -- Periodicals
005.74 - Journal URLs:
- http://www.inderscience.com/browse/index.php?journalID=152 ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1744-2621
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 8863.xml