An extensive study of authorship authentication of Arabic articles. Issue 1 (18th April 2017)
- Record Type:
- Journal Article
- Title:
- An extensive study of authorship authentication of Arabic articles. Issue 1 (18th April 2017)
- Main Title:
- An extensive study of authorship authentication of Arabic articles
- Authors:
- Al-Ayyoub, Mahmoud
Alwajeeh, Ahmed
Hmeidi, Ismail - Abstract:
- Abstract : Purpose: The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance. Design/methodology/approach: Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings. Findings: The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings. Practical implications: Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of thisAbstract : Purpose: The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance. Design/methodology/approach: Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings. Findings: The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings. Practical implications: Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of this work is the public release of its sources. Specifically, some of the SF can be very useful for other problems such as sentiment analysis. Originality/value: This is the first study of its kind to compare the SF and BOW approaches for authorship analysis of Arabic articles. Moreover, many of the computed SF are novel, while other features are inspired by the literature. As SF are language-dependent and most existing papers focus on English, extra effort must be invested to adapt such features to Arabic text. … (more)
- Is Part Of:
- International journal of web information systems. Volume 13:Issue 1(2017)
- Journal:
- International journal of web information systems
- Issue:
- Volume 13:Issue 1(2017)
- Issue Display:
- Volume 13, Issue 1 (2017)
- Year:
- 2017
- Volume:
- 13
- Issue:
- 1
- Issue Sort Value:
- 2017-0013-0001-0000
- Page Start:
- 85
- Page End:
- 104
- Publication Date:
- 2017-04-18
- Subjects:
- Arabic text processing -- Authorship authentication -- Bag-of-words -- Stylometric features
World Wide Web -- Periodicals
Internet -- Periodicals
Information storage and retrieval systems -- Periodicals
004.678 - Journal URLs:
- http://www.emeraldinsight.com/info/journals/ijwis/ijwis.jsp ↗
http://www.emeraldinsight.com/ ↗
http://www.troubador.co.uk/ijwis/ ↗ - DOI:
- 10.1108/IJWIS-03-2016-0011 ↗
- Languages:
- English
- ISSNs:
- 1744-0084
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.701180
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 50.xml