Two approaches to compilation of bilingual multi-word terminology lists from lexical resources. (28th July 2020)
- Record Type:
- Journal Article
- Title:
- Two approaches to compilation of bilingual multi-word terminology lists from lexical resources. (28th July 2020)
- Main Title:
- Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
- Authors:
- Šandrih, Branislava
Krstev, Cvetana
Stanković, Ranka - Abstract:
- Abstract: In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being varied. In the experiments presented in this paper, the source language was English, and the target language Serbian, and a selected domain was Library and Information Science, for which an aligned corpus exists, as well as a bilingual terminological dictionary. For term extraction, we used the FlexiTerm tool for the source language and a shallow parser for the target language, while for word alignment we used GIZA++. The evaluation results show that for the first approach the F 1 score varies from 29.43% to 51.15%, while for the second it varies from 61.03% to 71.03%. On the basis of the evaluation results, we developed a binary classifier that decides whether a candidate pair, composed of aligned source and target terms, is valid. We trained and evaluated different classifiers on a list of manually labeled candidate pairs obtained after the implementation of our extraction system. The best results in a fivefold cross-validation setting were achieved with the RadialAbstract: In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being varied. In the experiments presented in this paper, the source language was English, and the target language Serbian, and a selected domain was Library and Information Science, for which an aligned corpus exists, as well as a bilingual terminological dictionary. For term extraction, we used the FlexiTerm tool for the source language and a shallow parser for the target language, while for word alignment we used GIZA++. The evaluation results show that for the first approach the F 1 score varies from 29.43% to 51.15%, while for the second it varies from 61.03% to 71.03%. On the basis of the evaluation results, we developed a binary classifier that decides whether a candidate pair, composed of aligned source and target terms, is valid. We trained and evaluated different classifiers on a list of manually labeled candidate pairs obtained after the implementation of our extraction system. The best results in a fivefold cross-validation setting were achieved with the Radial Basis Function Support Vector Machine classifier, giving a F 1 score of 82.09% and accuracy of 78.49%. … (more)
- Is Part Of:
- Natural language engineering. Volume 26:Part 4(2020)
- Journal:
- Natural language engineering
- Issue:
- Volume 26:Part 4(2020)
- Issue Display:
- Volume 26, Issue 4, Part 4 (2020)
- Year:
- 2020
- Volume:
- 26
- Issue:
- 4
- Part:
- 4
- Issue Sort Value:
- 2020-0026-0004-0004
- Page Start:
- 455
- Page End:
- 479
- Publication Date:
- 2020-07-28
- Subjects:
- Language resources, -- Machine translation, -- Terminology extraction, -- Text classification
Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35 - Journal URLs:
- http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
- DOI:
- 10.1017/S1351324919000615 ↗
- Languages:
- English
- ISSNs:
- 1351-3249
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 14709.xml