A simplification–translation–restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation. (March 2017)
- Record Type:
- Journal Article
- Title:
- A simplification–translation–restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation. (March 2017)
- Main Title:
- A simplification–translation–restoration framework for domain adaptation in statistical machine translation: A case study in medical record translation
- Authors:
- Chen, Han-Bin
Huang, Hen-Hsen
Hsieh, An-Chang
Chen, Hsin-Hsi - Abstract:
- Abstract: Integration of in-domain knowledge into an out-of-domain statistical machine translation (SMT) system poses challenges due to the lack of resources. Lack of in-domain bilingual corpora is one such issue. In this paper, we propose a simplification–translation–restoration (STR) framework for domain adaptation in SMT systems. An SMT system to translate medical records from English to Chinese is taken as a case study. We identify the critical segments in a medical sentence and simplify them to alleviate the data sparseness problem in the out-of-domain SMT system. After translating the simplified sentence, the translations of these critical segments are restored to their proper positions. Besides the simplification pre-processing step and the restoration post-processing step, we also enhance the translation and language models in the STR framework by using pseudo bilingual corpora generated by the background MT system. In the experiments, we adapt an SMT system from a government document domain to a medical record domain. The results show the effectiveness of the STR framework.
- Is Part Of:
- Computer speech & language. Volume 42(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 42(2017)
- Issue Display:
- Volume 42, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 42
- Issue:
- 2017
- Issue Sort Value:
- 2017-0042-2017-0000
- Page Start:
- 59
- Page End:
- 80
- Publication Date:
- 2017-03
- Subjects:
- Cross-domain SMT -- Domain adaptation -- Statistical machine translation -- Medical document processing
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.08.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 704.xml