A unified framework and models for integrating translation memory into phrase-based statistical machine translation. (March 2019)
- Record Type:
- Journal Article
- Title:
- A unified framework and models for integrating translation memory into phrase-based statistical machine translation. (March 2019)
- Main Title:
- A unified framework and models for integrating translation memory into phrase-based statistical machine translation
- Authors:
- Liu, Yang
Wang, Kun
Zong, Chengqing
Su, Keh-Yih - Abstract:
- Highlights: A unified framework and four models which integrate translation memory into PBMT for different scenarios. We propose a unified framework for integrating translation memory into phrase-based statistical machine translation. We implement a model (based on the unified framework) for adopting the TM database as the SMT training set. We implement a model (based on the unified framework) for adopting a different SMT training set from the same domain. We implement two models (based on the unified framework) for adopting a different cross-domain SMT training set scenario. Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in TM matched and unmatched regions, a unified framework for integrating TM into phrase-based SMT is proposed in this paper. Unlike previous two-stage pipeline approaches, which directly merge TM results into the input sentences and subsequently let the SMT only translates those unmatched regions, the proposed framework refers to the corresponding TM information associated with each phrase at the SMT decoding. Under this unified framework, several integrated models are proposed to incorporate different types of information extracted from TM to guide the SMT decoding. We thus let SMT implicitly and indirectly utilize global context with a local dependency model. Furthermore, the SMT phrase table is dynamically enhanced with TM phrase pairs when the TM database and the SMT training set are different. OnHighlights: A unified framework and four models which integrate translation memory into PBMT for different scenarios. We propose a unified framework for integrating translation memory into phrase-based statistical machine translation. We implement a model (based on the unified framework) for adopting the TM database as the SMT training set. We implement a model (based on the unified framework) for adopting a different SMT training set from the same domain. We implement two models (based on the unified framework) for adopting a different cross-domain SMT training set scenario. Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in TM matched and unmatched regions, a unified framework for integrating TM into phrase-based SMT is proposed in this paper. Unlike previous two-stage pipeline approaches, which directly merge TM results into the input sentences and subsequently let the SMT only translates those unmatched regions, the proposed framework refers to the corresponding TM information associated with each phrase at the SMT decoding. Under this unified framework, several integrated models are proposed to incorporate different types of information extracted from TM to guide the SMT decoding. We thus let SMT implicitly and indirectly utilize global context with a local dependency model. Furthermore, the SMT phrase table is dynamically enhanced with TM phrase pairs when the TM database and the SMT training set are different. On a Chinese–English TM database, our experiments show that the proposed Model-I significantly improves over both SMT and TM when the SMT training set is also adopted as the TM database and when the fuzzy match score is over 0.4 (overall 3.5 BLEU points improvement and 2.6 TER points reduction). In addition, the proposed Model-II is significantly better than the TM and the SMT systems when the SMT training set and the TM database are different. Furthermore, the proposed Model-III outperforms both the TM and the SMT systems even when the SMT training set and the TM database are from different domains. Additionally, the proposed Model-IV further achieves significant improvements with the help of Top-N TM sentence pairs. Lastly, all our models significantly outperform those state-of-the-art approaches under all test conditions. … (more)
- Is Part Of:
- Computer speech & language. Volume 54(2019)
- Journal:
- Computer speech & language
- Issue:
- Volume 54(2019)
- Issue Display:
- Volume 54, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 54
- Issue:
- 2019
- Issue Sort Value:
- 2019-0054-2019-0000
- Page Start:
- 176
- Page End:
- 206
- Publication Date:
- 2019-03
- Subjects:
- Phrase-based machine translation -- Translation memory
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2018.09.006 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 8674.xml