Morphological segmentation method for Turkic language neural machine translation. Issue 1 (1st January 2020)
- Record Type:
- Journal Article
- Title:
- Morphological segmentation method for Turkic language neural machine translation. Issue 1 (1st January 2020)
- Main Title:
- Morphological segmentation method for Turkic language neural machine translation
- Authors:
- Tukeyev, U.
Karibayeva, A.
Zhumanov, Z h. - Editors:
- Pham, Duc
- Abstract:
- Abstract: Dictionaries play an important role in neural machine translation (NMT). However, a large dictionary requires a significant amount of memory, which limits the application of NMT and can cause a memory error. This limitation can be solved by segmenting each word into morphemes in parallel source corpora. Therefore, this study introduces a new morphological segmentation approach for Turkic languages based on the complete set of endings (CSE), which reduces the vocabulary volume of the source corpora. Herein, we demonstrate the proposed CSE-based morphological segmentation method for the Kazakh, Kyrgyz, and Uzbek languages and present the results of computational NMT experiments for the Kazakh language. The NMT experiment results show that in comparison with byte-pair encoding (BPE)-based segmentation, the proposed CSE-based segmentation increases the bilingual evaluation understudy score of 0.5 and 0.2 points on average for Kazakh–English and English–Kazakh pairs, respectively. Furthermore, in comparison with the BPE-based segmentation, the proposed CSE-based segmentation approach reduced the vocabulary size in NMT by more than a factor of two. This feature of the proposed segmentation approach will be crucial for NMT as the size of the source corpora is increased to improve translation quality.
- Is Part Of:
- Cogent engineering. Volume 7:Issue 1(2020)
- Journal:
- Cogent engineering
- Issue:
- Volume 7:Issue 1(2020)
- Issue Display:
- Volume 7, Issue 1 (2020)
- Year:
- 2020
- Volume:
- 7
- Issue:
- 1
- Issue Sort Value:
- 2020-0007-0001-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-01-01
- Subjects:
- neural machine translation -- morphological segmentation -- Turkic languages -- Kazakh -- Kyrgyz -- Uzbek
Engineering -- Periodicals
Technology -- Periodicals
Engineering
Technology
Periodicals
620 - Journal URLs:
- http://bibpurl.oclc.org/web/73324 ↗
http://cogentoa.tandfonline.com/journal/oaen20 ↗
http://www.tandfonline.com/toc/oaen20/1/1 ↗
http://www.tandfonline.com/ ↗
http://cogentoa.tandfonline.com/journal/oaps20 ↗ - DOI:
- 10.1080/23311916.2020.1856500 ↗
- Languages:
- English
- ISSNs:
- 2331-1916
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21972.xml