Unsupervised Arabic dialect segmentation for machine translation. (23rd March 2022)
- Record Type:
- Journal Article
- Title:
- Unsupervised Arabic dialect segmentation for machine translation. (23rd March 2022)
- Main Title:
- Unsupervised Arabic dialect segmentation for machine translation
- Authors:
- Salloum, Wael
Habash, Nizar - Abstract:
- Abstract: Resource-limited and morphologically rich languages pose many challenges to natural language processing tasks. Their highly inflected surface forms inflate the vocabulary size and increase sparsity in an already scarce data situation. In this article, we present an unsupervised learning approach to vocabulary reduction through morphological segmentation. We demonstrate its value in the context of machine translation for dialectal Arabic (DA), the primarily spoken, orthographically unstandardized, morphologically rich and yet resource poor variants of Standard Arabic. Our approach exploits the existence of monolingual and parallel data. We show comparable performance to state-of-the-art supervised methods for DA segmentation.
- Is Part Of:
- Natural language engineering. Volume 28:Number 2(2022)
- Journal:
- Natural language engineering
- Issue:
- Volume 28:Number 2(2022)
- Issue Display:
- Volume 28, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 28
- Issue:
- 2
- Issue Sort Value:
- 2022-0028-0002-0000
- Page Start:
- 223
- Page End:
- 248
- Publication Date:
- 2022-03-23
- Subjects:
- Machine translation -- Morphology -- Arabic dialects -- Unsupervised learning
Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35 - Journal URLs:
- http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
- DOI:
- 10.1017/S1351324920000455 ↗
- Languages:
- English
- ISSNs:
- 1351-3249
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 20655.xml