A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora. (July 2015)
- Record Type:
- Journal Article
- Title:
- A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora. (July 2015)
- Main Title:
- A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora
- Authors:
- Sánchez-Cartagena, Víctor M.
Pérez-Ortiz, Juan Antonio
Sánchez-Martínez, Felipe - Abstract:
- Abstract : Highlights: New approach to infer shallow-transfer rules from scarce parallel corpora. New rule formalism permits strong generalisation over the parallel corpus. First approach in which rule learning is rewritten as a global minimisation problem. Translation quality improves over previous approach with a smaller number of rules. Translation quality outperforms hand-coded rules for some language pairs. Abstract: Statistical and rule-based methods are complementary approaches to machine translation (MT) that have different strengths and weaknesses. This complementarity has, over the last few years, resulted in the consolidation of a growing interest in hybrid systems that combine both data-driven and linguistic approaches. In this paper, we address the situation in which the amount of bilingual resources that is available for a particular language pair is not sufficiently large to train a competitive statistical MT system, but the cost and slow development cycles of rule-based MT systems cannot be afforded either. In this context, we formalise a new method that uses scarce parallel corpora to automatically infer a set of shallow-transfer rules to be integrated into a rule-based MT system, thus avoiding the need for human experts to handcraft these rules. Our work is based on the alignment template approach to phrase-based statistical MT, but the definition of the alignment template is extended to encompass different generalisation levels. It is also greatly inspiredAbstract : Highlights: New approach to infer shallow-transfer rules from scarce parallel corpora. New rule formalism permits strong generalisation over the parallel corpus. First approach in which rule learning is rewritten as a global minimisation problem. Translation quality improves over previous approach with a smaller number of rules. Translation quality outperforms hand-coded rules for some language pairs. Abstract: Statistical and rule-based methods are complementary approaches to machine translation (MT) that have different strengths and weaknesses. This complementarity has, over the last few years, resulted in the consolidation of a growing interest in hybrid systems that combine both data-driven and linguistic approaches. In this paper, we address the situation in which the amount of bilingual resources that is available for a particular language pair is not sufficiently large to train a competitive statistical MT system, but the cost and slow development cycles of rule-based MT systems cannot be afforded either. In this context, we formalise a new method that uses scarce parallel corpora to automatically infer a set of shallow-transfer rules to be integrated into a rule-based MT system, thus avoiding the need for human experts to handcraft these rules. Our work is based on the alignment template approach to phrase-based statistical MT, but the definition of the alignment template is extended to encompass different generalisation levels. It is also greatly inspired by the work of Sánchez-Martínez and Forcada (2009) in which alignment templates were also considered for shallow-transfer rule inference. However, our approach overcomes many relevant limitations of that work, principally those related to the inability to find the correct generalisation level for the alignment templates, and to select the subset of alignment templates that ensures an adequate segmentation of the input sentences by the rules eventually obtained. Unlike previous approaches in literature, our formalism does not require linguistic knowledge about the languages involved in the translation. Moreover, it is the first time that conflicts between rules are resolved by choosing the most appropriate ones according to a global minimisation function rather than proceeding in a pairwise greedy fashion. Experiments conducted using five different language pairs with the free/open-source rule-based MT platform Apertium show that translation quality significantly improves when compared to the method proposed by Sánchez-Martínez and Forcada (2009), and is close to that obtained using handcrafted rules. For some language pairs, our approach is even able to outperform them. Moreover, the resulting number of rules is considerably smaller, which eases human revision and maintenance. … (more)
- Is Part Of:
- Computer speech & language. Volume 32(2015)
- Journal:
- Computer speech & language
- Issue:
- Volume 32(2015)
- Issue Display:
- Volume 32, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 32
- Issue:
- 2015
- Issue Sort Value:
- 2015-0032-2015-0000
- Page Start:
- 46
- Page End:
- 90
- Publication Date:
- 2015-07
- Subjects:
- Machine translation -- Transfer rule inference -- Hybrid machine translation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2014.10.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5431.xml