Building and using multimodal comparable corpora for machine translation. (15th June 2016)
- Record Type:
- Journal Article
- Title:
- Building and using multimodal comparable corpora for machine translation. (15th June 2016)
- Main Title:
- Building and using multimodal comparable corpora for machine translation
- Authors:
- AFLI, HAITHEM
BARRAULT, LOÏC
SCHWENK, HOLGER - Editors:
- Rapp, Reinhard
Sharoff, Serge
Zweigenbaum, Pierre - Abstract:
- Abstract: In recent decades, statistical approaches have significantly advanced the development of machine translation systems. However, the applicability of these methods directly depends on the availability of very large quantities of parallel data. Recent works have demonstrated that a comparable corpus can compensate for the shortage of parallel corpora. In this paper, we propose an alternative to comparable corpora containing text documents as resources for extracting parallel data: a multimodal comparable corpus with audio documents in source language and text document in target language, built from Euronews and TED web sites. The audio is transcribed by an automatic speech recognition system, and translated with a baseline statistical machine translation system. We then use information retrieval in a large text corpus in the target language in order to extract parallel sentences/phrases. We evaluate the quality of the extracted data on an English to French translation task and show significant improvements over a state-of-the-art baseline.
- Is Part Of:
- Natural language engineering. Volume 22:Part 4(2016)
- Journal:
- Natural language engineering
- Issue:
- Volume 22:Part 4(2016)
- Issue Display:
- Volume 22, Issue 4, Part 4 (2016)
- Year:
- 2016
- Volume:
- 22
- Issue:
- 4
- Part:
- 4
- Issue Sort Value:
- 2016-0022-0004-0004
- Page Start:
- 603
- Page End:
- 625
- Publication Date:
- 2016-06-15
- Subjects:
- Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35 - Journal URLs:
- http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
- DOI:
- 10.1017/S1351324916000152 ↗
- Languages:
- English
- ISSNs:
- 1351-3249
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 14465.xml