Translating without in-domain corpus: Machine translation post-editing with online learning techniques. (July 2015)
- Record Type:
- Journal Article
- Title:
- Translating without in-domain corpus: Machine translation post-editing with online learning techniques. (July 2015)
- Main Title:
- Translating without in-domain corpus: Machine translation post-editing with online learning techniques
- Authors:
- Lagarda, Antonio L.
Ortiz-Martínez, Daniel
Alabau, Vicent
Casacuberta, Francisco - Abstract:
- Abstract : Highlights: We present a method to customize machine translation systems when in-domain data is not available. For that we perform an online learning automatic post-editing from ready-to-use generic machine translation systems. The results show that the method is very effective on rule-based machine translation systems. On statistical machine translation systems the method performs well if no in-domain data was used in the training. Finally, if there is not enough repetition our method has limited use. Abstract: Globalization has dramatically increased the need of translating information from one language to another. Frequently, such translation needs should be satisfied under very tight time constraints. Machine translation (MT) techniques can constitute a solution to this overly complex problem. However, the documents to be translated in real scenarios are often limited to a specific domain, such as a particular type of medical or legal text. This situation seriously hinders the applicability of MT, since it is usually expensive to build a reliable translation system, no matter what technology is used, due to the linguistic resources that are required to build them, such as dictionaries, translation memories or parallel texts. In order to solve this problem, we propose the application of automatic post-editing in an online learning framework. Our proposed technique allows the human expert to translate in a specific domain by using a base translation systemAbstract : Highlights: We present a method to customize machine translation systems when in-domain data is not available. For that we perform an online learning automatic post-editing from ready-to-use generic machine translation systems. The results show that the method is very effective on rule-based machine translation systems. On statistical machine translation systems the method performs well if no in-domain data was used in the training. Finally, if there is not enough repetition our method has limited use. Abstract: Globalization has dramatically increased the need of translating information from one language to another. Frequently, such translation needs should be satisfied under very tight time constraints. Machine translation (MT) techniques can constitute a solution to this overly complex problem. However, the documents to be translated in real scenarios are often limited to a specific domain, such as a particular type of medical or legal text. This situation seriously hinders the applicability of MT, since it is usually expensive to build a reliable translation system, no matter what technology is used, due to the linguistic resources that are required to build them, such as dictionaries, translation memories or parallel texts. In order to solve this problem, we propose the application of automatic post-editing in an online learning framework. Our proposed technique allows the human expert to translate in a specific domain by using a base translation system designed to work in a general domain whose output is corrected (or adapted to the specific domain) by means of an automatic post-editing module. This automatic post-editing module learns to make its corrections from user feedback in real time by means of online learning techniques. We have validated our system using different translation technologies to implement the base translation system, as well as several texts involving different domains and languages. In most cases, our results show significant improvements in terms of BLEU (up to 16 points) with respect to the baseline systems. The proposed technique works effectively when the n-grams of the document to be translated presents a certain rate of repetition, situation which is common according to the document-internal repetition property. … (more)
- Is Part Of:
- Computer speech & language. Volume 32(2015)
- Journal:
- Computer speech & language
- Issue:
- Volume 32(2015)
- Issue Display:
- Volume 32, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 32
- Issue:
- 2015
- Issue Sort Value:
- 2015-0032-2015-0000
- Page Start:
- 109
- Page End:
- 134
- Publication Date:
- 2015-07
- Subjects:
- Machine translation -- Statistical machine translation -- Interactive machine translation -- Automatic post-editing -- Online learning
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2014.10.004 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5431.xml