Neural machine translation of low-resource languages using SMT phrase pair injection. (17th May 2021)

Record Type:: Journal Article
Title:: Neural machine translation of low-resource languages using SMT phrase pair injection. (17th May 2021)
Main Title:: Neural machine translation of low-resource languages using SMT phrase pair injection
Authors:: Sen, Sukanta
Hasanuzzaman, Mohammed
Ekbal, Asif
Bhattacharyya, Pushpak
Way, Andy
Abstract:: Abstract: Neural machine translation (NMT) has recently shown promising results on publicly available benchmark datasets and is being rapidly adopted in various production systems. However, it requires high-quality large-scale parallel corpus, and it is not always possible to have sufficiently large corpus as it requires time, money, and professionals. Hence, many existing large-scale parallel corpus are limited to the specific languages and domains. In this paper, we propose an effective approach to improve an NMT system in low-resource scenario without using any additional data. Our approach aims at augmenting the original training data by means of parallel phrases extracted from the original training data itself using a statistical machine translation (SMT) system. Our proposed approach is based on the gated recurrent unit (GRU) and transformer networks. We choose the Hindi–English, Hindi–Bengali datasets for Health, Tourism, and Judicial (only for Hindi–English) domains. We train our NMT models for 10 translation directions, each using only 5–23k parallel sentences. Experiments show the improvements in the range of 1.38–15.36 BiLingual Evaluation Understudy points over the baseline systems. Experiments show that transformer models perform better than GRU models in low-resource scenarios. In addition to that, we also find that our proposed method outperforms SMT—which is known to work better than the neural models in low-resource scenarios—for some translation directions. … (more)
Is Part Of:: Natural language engineering. Volume 27:Part 3(2021)
Journal:: Natural language engineering
Issue:: Volume 27:Part 3(2021)
Issue Display:: Volume 27, Issue 3, Part 3 (2021)
Year:: 2021
Volume:: 27
Issue:: 3
Part:: 3
Issue Sort Value:: 2021-0027-0003-0003
Page Start:: 271
Page End:: 292
Publication Date:: 2021-05-17
Subjects:: Machine translation, -- Translation technology
Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35
Journal URLs:: http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
DOI:: 10.1017/S1351324920000303 ↗
Languages:: English
ISSNs:: 1351-3249
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library HMNTS - ELD Digital store
Ingest File:: 16848.xml