Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction. (15th June 2016)
- Record Type:
- Journal Article
- Title:
- Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction. (15th June 2016)
- Main Title:
- Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction
- Authors:
- MORIN, EMMANUEL
HAZEM, AMIR - Editors:
- Rapp, Reinhard
Sharoff, Serge
Zweigenbaum, Pierre - Abstract:
- Abstract: The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced in terms of size. However, the historical context-based projection method is relatively insensitive to the size of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora and on the quality of bilingual terminology extraction by doing different experiments. Moreover, we have introduced a strategy into the context-based projection method to re-estimate word co-occurrence observations. This is done by using smoothing or prediction techniques that boost the observations of word co-occurrences which are mainly useful for the smallest part of an unbalanced comparable corpus. Our results show that the use of unbalanced specialized comparable corpora results in a significant improvement in the quality of extracted lexicons.
- Is Part Of:
- Natural language engineering. Volume 22:Part 4(2016)
- Journal:
- Natural language engineering
- Issue:
- Volume 22:Part 4(2016)
- Issue Display:
- Volume 22, Issue 4, Part 4 (2016)
- Year:
- 2016
- Volume:
- 22
- Issue:
- 4
- Part:
- 4
- Issue Sort Value:
- 2016-0022-0004-0004
- Page Start:
- 575
- Page End:
- 601
- Publication Date:
- 2016-06-15
- Subjects:
- Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35 - Journal URLs:
- http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
- DOI:
- 10.1017/S1351324916000140 ↗
- Languages:
- English
- ISSNs:
- 1351-3249
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 14465.xml