Improving selection of synsets from WordNet for domain-specific word sense disambiguation. (January 2017)
- Record Type:
- Journal Article
- Title:
- Improving selection of synsets from WordNet for domain-specific word sense disambiguation. (January 2017)
- Main Title:
- Improving selection of synsets from WordNet for domain-specific word sense disambiguation
- Authors:
- Lopez-Arevalo, Ivan
Sosa-Sosa, Victor J.
Rojas-Lopez, Franco
Tello-Leal, Edgar - Abstract:
- Highlights: Unsupervised approach for selecting the predominant synset from WordNet for instances of ambiguous words. An auxiliary corpus is generated from the test corpus by using information from the Web. Lexical information (neighbors of ambiguous words) is obtained from the test corpus. Semantic information (dependency relations) is obtained from the auxiliary corpus. Abstract: Word Sense Disambiguation (WSD) is a fundamental task useful for Information Retrieval, Information Extraction, web search, and indexing, among others. In the literature there exist several works dedicated to generic WSD task, but in recent years domain-specific WSD has attracted the attention of several researchers. In this sense, this paper describes an approach for domain-specific WSD by selecting the predominant sense (synset from WordNet) of ambiguous words. To achieve it the method uses two corpora: the domain-specific test corpus (containing target ambiguous words) and a domain-specific auxiliary corpus (obtained by using relevant words from the domain-specific test corpus ). The approach has four main stages: (1) auxiliary corpus generation; (2) related features extraction (from the auxiliary corpus); (3) test features extraction (from the test corpus); and (4) features integration . The proposed approach has been tested on domain-specific corpora (Sports and Finance) and on one balanced corpus, BNC. Even though our WSD approach showed some limitations when dealing with the general-domainHighlights: Unsupervised approach for selecting the predominant synset from WordNet for instances of ambiguous words. An auxiliary corpus is generated from the test corpus by using information from the Web. Lexical information (neighbors of ambiguous words) is obtained from the test corpus. Semantic information (dependency relations) is obtained from the auxiliary corpus. Abstract: Word Sense Disambiguation (WSD) is a fundamental task useful for Information Retrieval, Information Extraction, web search, and indexing, among others. In the literature there exist several works dedicated to generic WSD task, but in recent years domain-specific WSD has attracted the attention of several researchers. In this sense, this paper describes an approach for domain-specific WSD by selecting the predominant sense (synset from WordNet) of ambiguous words. To achieve it the method uses two corpora: the domain-specific test corpus (containing target ambiguous words) and a domain-specific auxiliary corpus (obtained by using relevant words from the domain-specific test corpus ). The approach has four main stages: (1) auxiliary corpus generation; (2) related features extraction (from the auxiliary corpus); (3) test features extraction (from the test corpus); and (4) features integration . The proposed approach has been tested on domain-specific corpora (Sports and Finance) and on one balanced corpus, BNC. Even though our WSD approach showed some limitations when dealing with the general-domain corpus, the obtained results for domain-specific corpora, which are our main interest, were better than those reported in previous works. … (more)
- Is Part Of:
- Computer speech & language. Volume 41(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 41(2016)
- Issue Display:
- Volume 41, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 41
- Issue:
- 2016
- Issue Sort Value:
- 2016-0041-2016-0000
- Page Start:
- 128
- Page End:
- 145
- Publication Date:
- 2017-01
- Subjects:
- Domain-specific word sense disambiguation -- WordNet -- Synset -- Context
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.06.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2481.xml