Parallel fragments : Measuring their impact on translation performance. (May 2017)
- Record Type:
- Journal Article
- Title:
- Parallel fragments : Measuring their impact on translation performance. (May 2017)
- Main Title:
- Parallel fragments : Measuring their impact on translation performance
- Authors:
- Abdul-Rauf, Sadaf
Schwenk, Holger
Nawaz, Mohammad - Abstract:
- Highlights: Phrase fragments have proved to be a valuable resource for increasing translation and natural language generation performance. A novel approach to find parallel fragments from comparable corpora is presented which is simple and efficient in processing. Difference in translation improvement for fragments extracted from related versus non related corpus is presented. Comparison of impact of parallel fragments vs. sentences is reported highlighting the significance of parallel segments. Proposed approach is compared theoretically with an earlier approach on all phases of the fragment extraction pipeline. Abstract: Lack of parallel corpora have diverted the direction of research towards exploring other arenas to fill in the dearth. Comparable corpora have proved to be a valuable resource in this regard. Interestingly other than the parallel sentences extracted from comparable corpora, parallel phrase fragments have also proved to be beneficial for statistical machine translation. We present a novel approach based on an efficient framework for parallel fragment extraction from comparable corpora. Using the fragments as additional corpus for translation, we are able to obtain an improvement of 0.88 and 0.89 BLEU points on test data for Arabic–English and French–English systems respectively. We have also conducted a detailed analysis of impact of fragments extracted from related vs non-related corpus. A comparison of impact of parallel fragments vs. parallel sentencesHighlights: Phrase fragments have proved to be a valuable resource for increasing translation and natural language generation performance. A novel approach to find parallel fragments from comparable corpora is presented which is simple and efficient in processing. Difference in translation improvement for fragments extracted from related versus non related corpus is presented. Comparison of impact of parallel fragments vs. sentences is reported highlighting the significance of parallel segments. Proposed approach is compared theoretically with an earlier approach on all phases of the fragment extraction pipeline. Abstract: Lack of parallel corpora have diverted the direction of research towards exploring other arenas to fill in the dearth. Comparable corpora have proved to be a valuable resource in this regard. Interestingly other than the parallel sentences extracted from comparable corpora, parallel phrase fragments have also proved to be beneficial for statistical machine translation. We present a novel approach based on an efficient framework for parallel fragment extraction from comparable corpora. Using the fragments as additional corpus for translation, we are able to obtain an improvement of 0.88 and 0.89 BLEU points on test data for Arabic–English and French–English systems respectively. We have also conducted a detailed analysis of impact of fragments extracted from related vs non-related corpus. A comparison of impact of parallel fragments vs. parallel sentences is also presented highlighting the significance of parallel segments for statistical machine translation. The article concludes with a crude comparative analysis of our approach with an existing fragment extraction technique at various stages of the fragment extraction pipeline. … (more)
- Is Part Of:
- Computer speech & language. Volume 43(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 43(2017)
- Issue Display:
- Volume 43, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 43
- Issue:
- 2017
- Issue Sort Value:
- 2017-0043-2017-0000
- Page Start:
- 56
- Page End:
- 69
- Publication Date:
- 2017-05
- Subjects:
- Parallel fragments -- Statistical machine translation -- Comparable corpus
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.12.002 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 276.xml