Unsupervised Approach to Word Sense Disambiguation in Malayalam. (2016)
- Record Type:
- Journal Article
- Title:
- Unsupervised Approach to Word Sense Disambiguation in Malayalam. (2016)
- Main Title:
- Unsupervised Approach to Word Sense Disambiguation in Malayalam
- Authors:
- Sankar, K.P. Sruthi
Raj, P.C. Reghu
Jayan, V. - Abstract:
- Abstract: Word Sense Disambiguation (WSD) is the task of identifying the correct sense of a word in a specific context when the word has multiple meaning. WSD is very important as an intermediate step in many Natural Language Processing (NLP) tasks, especially in Information Extraction(IE), Machine Translation(MT) and Question/Answering Systems. Word sense ambiguity arises when a particular word has more than one possible sense. The peculiarity of any language is that it includes a lot of ambiguous words. Since the sense of a word depends on its context of use, disambiguation process requires the understanding of word knowledge. Automatic WSD systems are available for structured languages like English, Chinese, etc. But Indian languages are morphologically rich and thus the processing task is very complex. The aim of this work is to develop a WSD system for Malayalam, a language spoken in India, predominantly used in the state of Kerala. The proposed system uses a corpus which is collected from various Malayalam web documents. For each possible sense of the ambiguous word, a relatively small set of training examples (seed sets) are identified which represents the sense. Collocations and most co-occurring words are considered as training examples. Seed set expansion module extends the seed set by adding most similar words to the seed set elements. These extended sets act as sense clusters. The most similar sense cluster to the input text context is considered as the sense ofAbstract: Word Sense Disambiguation (WSD) is the task of identifying the correct sense of a word in a specific context when the word has multiple meaning. WSD is very important as an intermediate step in many Natural Language Processing (NLP) tasks, especially in Information Extraction(IE), Machine Translation(MT) and Question/Answering Systems. Word sense ambiguity arises when a particular word has more than one possible sense. The peculiarity of any language is that it includes a lot of ambiguous words. Since the sense of a word depends on its context of use, disambiguation process requires the understanding of word knowledge. Automatic WSD systems are available for structured languages like English, Chinese, etc. But Indian languages are morphologically rich and thus the processing task is very complex. The aim of this work is to develop a WSD system for Malayalam, a language spoken in India, predominantly used in the state of Kerala. The proposed system uses a corpus which is collected from various Malayalam web documents. For each possible sense of the ambiguous word, a relatively small set of training examples (seed sets) are identified which represents the sense. Collocations and most co-occurring words are considered as training examples. Seed set expansion module extends the seed set by adding most similar words to the seed set elements. These extended sets act as sense clusters. The most similar sense cluster to the input text context is considered as the sense of the target word. … (more)
- Is Part Of:
- Procedia technology. Volume 24(2016)
- Journal:
- Procedia technology
- Issue:
- Volume 24(2016)
- Issue Display:
- Volume 24, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 24
- Issue:
- 2016
- Issue Sort Value:
- 2016-0024-2016-0000
- Page Start:
- 1507
- Page End:
- 1513
- Publication Date:
- 2016
- Subjects:
- Word sense disambiguation -- Unsupervised methods -- Information extraction -- Collocations -- Context similarity.
Technology -- Congresses
Technology -- Periodicals
Engineering -- Congresses
Engineering -- Periodicals
Engineering
Technology
Conference proceedings
Periodicals
605 - Journal URLs:
- http://www.sciencedirect.com/science/journal/22120173 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.protcy.2016.05.106 ↗
- Languages:
- English
- ISSNs:
- 2212-0173
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2229.xml