Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Issue 6 (November 2016)
- Record Type:
- Journal Article
- Title:
- Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Issue 6 (November 2016)
- Main Title:
- Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information
- Authors:
- Ehsan, Nava
Shakery, Azadeh - Abstract:
- Highlights: Proposing a candidate retrieval model for cross-lingual plagiarism detection The method relies on using two levels of proximity information Proposing a topic-based text segmentation method Comparing the method with other cross-lingual plagiarism detection approaches Showing improvements using text segmentation and positional language models Abstract: The rapid growth of documents in different languages, the increased accessibility of electronic documents, and the availability of translation tools have caused cross-lingual plagiarism detection research area to receive increasing attention in recent years. The task of cross-language plagiarism detection entails two main steps: candidate retrieval and assessing pairwise document similarity. In this paper we examine candidate retrieval, where the goal is to find potential source documents of a suspicious text. Our proposed method for cross-language plagiarism detection is a keyword-focused approach. Since plagiarism usually happens in parts of the text, there is a requirement to segment the texts into fragments to detect local similarity. Therefore we propose a topic-based segmentation algorithm to convert the suspicious document to a set of related passages. After that, we use a proximity-based model to retrieve documents with the best matching passages. Experiments show promising results for this important phase of cross-language plagiarism detection.
- Is Part Of:
- Information processing & management. Volume 52:Issue 6(2016:Nov.)
- Journal:
- Information processing & management
- Issue:
- Volume 52:Issue 6(2016:Nov.)
- Issue Display:
- Volume 52, Issue 6 (2016)
- Year:
- 2016
- Volume:
- 52
- Issue:
- 6
- Issue Sort Value:
- 2016-0052-0006-0000
- Page Start:
- 1004
- Page End:
- 1017
- Publication Date:
- 2016-11
- Subjects:
- Candidate document retrieval -- Cross-language plagiarism detection -- Text segmentation -- Proximity-based retrieval
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2016.04.006 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 7325.xml