Building a multi-domain comparable corpus using a learning to rank method. (15th June 2016)

Record Type:: Journal Article
Title:: Building a multi-domain comparable corpus using a learning to rank method. (15th June 2016)
Main Title:: Building a multi-domain comparable corpus using a learning to rank method
Authors:: RAHIMI, RAZIEH
SHAKERY, AZADEH
DADASHKARIMI, JAVID
ARIANNEZHAD, MOZHDEH
DEHGHANI, MOSTAFA
ESFAHANI, HOSSEIN NASR
Editors:: Rapp, Reinhard
Sharoff, Serge
Zweigenbaum, Pierre
Abstract:: Abstract: Comparable corpora are key translation resources for both languages and domains with limited linguistic resources. The existing approaches for building comparable corpora are mostly based on ranking candidate documents in the target language for each source document using a cross-lingual retrieval model. These approaches also exploit other evidence of document similarity, such as proper names and publication dates, to build more reliable alignments. However, the importance of each evidence in the scores of candidate target documents is determined heuristically. In this paper, we employ a learning to rank method for ranking candidate target documents with respect to each source document. The ranking model is constructed by defining each evidence for similarity of bilingual documents as a feature whose weight is learned automatically. Learning feature weights can significantly improve the quality of alignments, because the reliability of features depends on the characteristics of both source and target languages of a comparable corpus. We also propose a method to generate appropriate training data for the task of building comparable corpora. We employed the proposed learning-based approach to build a multi-domain English–Persian comparable corpus which covers twelve different domains obtained from Open Directory Project. Experimental results show that the created alignments have high degrees of comparability. Comparison with existing approaches for building … (more)
Is Part Of:: Natural language engineering. Volume 22:Part 4(2016)
Journal:: Natural language engineering
Issue:: Volume 22:Part 4(2016)
Issue Display:: Volume 22, Issue 4, Part 4 (2016)
Year:: 2016
Volume:: 22
Issue:: 4
Part:: 4
Issue Sort Value:: 2016-0022-0004-0004
Page Start:: 627
Page End:: 653
Publication Date:: 2016-06-15
Subjects:: Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35
Journal URLs:: http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
DOI:: 10.1017/S1351324916000164 ↗
Languages:: English
ISSNs:: 1351-3249
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library HMNTS - ELD Digital store
Ingest File:: 14465.xml