Phrase-boundary model for statistical machine translation. (July 2016)
- Record Type:
- Journal Article
- Title:
- Phrase-boundary model for statistical machine translation. (July 2016)
- Main Title:
- Phrase-boundary model for statistical machine translation
- Authors:
- Salami, Shahram
Shamsfard, Mehrnoush
Khadivi, Shahram - Abstract:
- Highlights: We proposed an SMT model labeling nonterminals with boundary word classes of phrases. Word classes can be defined by POS tags and automatic word clustering. The proposed model was filtered considering alignment pattern of phrase pairs. Limited patterns of rules extracted from phrase pairs that are decomposable. Abstract: This paper proposes a new probabilistic synchronous context-free grammar model for statistical machine translation. The model labels nonterminals with classes of boundary words on the target side of aligned phrase pairs. Labeling of the rules is performed with coarse grained and fine grained nonterminals using POS tags and word clusters trained on the target language corpus. Considering the large size of the proposed model due to the diversity of nonterminals, we have also proposed a novel approach for filtered rule extraction based on the alignment pattern of phrase pairs. Using limited patterns of rules, the extraction of hierarchical rules gets restricted from phrase pairs that are decomposable to two aligned subphrases. The proposed filtered rule extraction decreases the model size and the decoding time considerably with no significant impact on the translation quality. Using BLEU as a metric in our experiments, the proposed model achieved a notable improvement rate over the state-of-the-art hierarchical phrase-based model in the translation from Persian, French and Spanish to English language. This is applicable for all languages, evenHighlights: We proposed an SMT model labeling nonterminals with boundary word classes of phrases. Word classes can be defined by POS tags and automatic word clustering. The proposed model was filtered considering alignment pattern of phrase pairs. Limited patterns of rules extracted from phrase pairs that are decomposable. Abstract: This paper proposes a new probabilistic synchronous context-free grammar model for statistical machine translation. The model labels nonterminals with classes of boundary words on the target side of aligned phrase pairs. Labeling of the rules is performed with coarse grained and fine grained nonterminals using POS tags and word clusters trained on the target language corpus. Considering the large size of the proposed model due to the diversity of nonterminals, we have also proposed a novel approach for filtered rule extraction based on the alignment pattern of phrase pairs. Using limited patterns of rules, the extraction of hierarchical rules gets restricted from phrase pairs that are decomposable to two aligned subphrases. The proposed filtered rule extraction decreases the model size and the decoding time considerably with no significant impact on the translation quality. Using BLEU as a metric in our experiments, the proposed model achieved a notable improvement rate over the state-of-the-art hierarchical phrase-based model in the translation from Persian, French and Spanish to English language. This is applicable for all languages, even under-resourced ones having no linguistic tools. … (more)
- Is Part Of:
- Computer speech & language. Volume 38(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 38(2016)
- Issue Display:
- Volume 38, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 38
- Issue:
- 2016
- Issue Sort Value:
- 2016-0038-2016-0000
- Page Start:
- 13
- Page End:
- 27
- Publication Date:
- 2016-07
- Subjects:
- Statistical machine translation -- Hierarchical models -- Rules filtering
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2015.11.005 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2379.xml