Extracting parallel fragments from comparable documents using a generative model. (January 2019)
- Record Type:
- Journal Article
- Title:
- Extracting parallel fragments from comparable documents using a generative model. (January 2019)
- Main Title:
- Extracting parallel fragments from comparable documents using a generative model
- Authors:
- Bakhshaei, Somayeh
Safabakhsh, Reza
Khadivi, Shahram - Abstract:
- Highlights: Study of comparable corpora(CC) for extracting parallel information. Generative model for extracting parallel fragments of CC without need of initial seed. Enhancing a SMT system using extracted parallel fragments from CC. Abstract: Although parallel corpora are essential language resources for many natural language processing tasks, they are rare or even not available for many language pairs. Instead, comparable corpora are widely available and contain parallel fragments of information that can be used in applications like statistical machine translation systems. In this research, we propose a generative latent Dirichlet allocation based model for extracting parallel fragments from comparable documents without using any initial parallel data or bilingual lexicon. The experimental results show significant improvement if the extracted fragments generated by the proposed method are used for augmenting an existing parallel corpus in an statistical machine translation system. According to the human judgment, the accuracy of the proposed method for an English-Persian task is about 59.7%. Also, the out of vocabulary error rate for the same task is reduced by 28%.
- Is Part Of:
- Computer speech & language. Volume 53(2019)
- Journal:
- Computer speech & language
- Issue:
- Volume 53(2019)
- Issue Display:
- Volume 53, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 53
- Issue:
- 2019
- Issue Sort Value:
- 2019-0053-2019-0000
- Page Start:
- 25
- Page End:
- 42
- Publication Date:
- 2019-01
- Subjects:
- Fragment extraction -- Comparable corpora -- Generative model -- Statistical machine translation -- Persian -- English -- German
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2018.07.002 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 7651.xml