On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks. (March 2016)

Record Type:: Journal Article
Title:: On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks. (March 2016)
Main Title:: On the feasibility of character n-grams pseudo-translation for Cross-Language Information Retrieval tasks
Authors:: Vilares, Jesús
Vilares, Manuel
Alonso, Miguel A.
Oakes, Michael P.
Abstract:: Abstract : Highlights: We analyze the use of character n -grams both as indexing and translation units for CLIR tasks. We study their effective application and consistency across languages. We use an algorithm of our own for parallel text alignment at the subword level. Tests were performed for seven languages, with English as the target language. Results confirm their feasibility and consistency, that their validity is not tied to a given implementation, and a remarkable robustness. Abstract: The field of Cross-Language Information Retrieval relates techniques close to both the Machine Translation and Information Retrieval fields, although in a context involving characteristics of its own. The present study looks to widen our knowledge about the effectiveness and applicability to that field of non-classical translation mechanisms that work at character n -gram level. For the purpose of this study, an n -gram based system of this type has been developed. This system requires only a bilingual machine-readable dictionary of n -grams, automatically generated from parallel corpora, which serves to translate queries previously n -grammed in the source language. n -Gramming is then used as an approximate string matching technique to perform monolingual text retrieval on the set of n -grammed documents in the target language. The tests for this work have been performed on CLEF collections for seven European languages, taking English as the target language. After an initial tuning … (more)
Is Part Of:: Computer speech & language. Volume 36(2016)
Journal:: Computer speech & language
Issue:: Volume 36(2016)
Issue Display:: Volume 36, Issue 2016 (2016)
Year:: 2016
Volume:: 36
Issue:: 2016
Issue Sort Value:: 2016-0036-2016-0000
Page Start:: 136
Page End:: 164
Publication Date:: 2016-03
Subjects:: Cross-Language Information Retrieval -- Character n-grams -- Alignment algorithms for Machine Translation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2015.09.004 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 528.xml