Cross-Lingual Text Reuse Detection at sentence level for English–Urdu language pair. (September 2022)

Record Type:: Journal Article
Title:: Cross-Lingual Text Reuse Detection at sentence level for English–Urdu language pair. (September 2022)
Main Title:: Cross-Lingual Text Reuse Detection at sentence level for English–Urdu language pair
Authors:: Muneer, Iqra
Nawab, Rao Muhammad Adeel
Abstract:: Abstract: In recent years, the problem of Cross-Lingual Text Reuse Detection (X-TRD) has gained the interest of researchers due to the availability of large digital repositories and automatic translation systems. These systems are promptly available and openly accessible, which makes it easier to reuse text across the languages and hard to detect. In previous studies, different corpora and techniques have been developed for X-TRD at sentence/passage and document level for the English–Urdu language pair. However, there is a lack of large benchmark corpora and standard techniques for X-TRD for the English–Urdu language pair at the sentence level. To overcome this limitation, this study presents a large benchmark sentential cross-lingual (English–Urdu) corpus of 21, 669 sentence pairs with simulated cases of X-TR, which are manually annotated at three levels of rewrite (Wholly Derived (WD) = 7, 655, Partially Derived (PD) = 6, 461, and Non Derived (ND) = 7, 553). As a second major contribution, we have applied various state-of-the-art Cross-Lingual Sentence Transformers (CLST), and Translation plus Mono-lingual Analysis (T+MA) including N-gram Overlap (lexical), WordNet based techniques (semantic), mono-lingual word embedding-based techniques, and Kullback–Leibler Distance (KLD) (probabilistic) on our proposed sentential corpus for X-TRD. For the binary classification, the best results are obtained ( F 1 = 0.94) using a combination of all CLST and T+MA techniques and a … (more)
Is Part Of:: Computer speech & language. Volume 75(2022)
Journal:: Computer speech & language
Issue:: Volume 75(2022)
Issue Display:: Volume 75, Issue 2022 (2022)
Year:: 2022
Volume:: 75
Issue:: 2022
Issue Sort Value:: 2022-0075-2022-0000
Page Start:
Page End:
Publication Date:: 2022-09
Subjects:: Cross-Lingual Text Reuse -- Cross-Lingual Text Reuse Detection -- English–Urdu language pair -- Cross-lingual Sentence Transformer -- Translation plus Mono-Lingual Analysis
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2022.101381 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 21383.xml