A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter. Issue 99 (July 2021)
- Record Type:
- Journal Article
- Title:
- A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter. Issue 99 (July 2021)
- Main Title:
- A real-time deep-learning approach for filtering Arabic low-quality content and accounts on Twitter
- Authors:
- Alharthi, Reem
Alhothali, Areej
Moria, Kawthar - Abstract:
- Abstract: Social networks have generated immense amounts of data that have been successfully utilized for research and business purposes. The approachability and immediacy of social media have also allowed ill-intentioned users to perform several harmful activities that include spamming, promoting, and phishing. These activities generate massive amounts of low-quality content that often exhibits duplicate, automated, inappropriate, or irrelevant content that subsequently affects users' satisfaction and imposes a significant challenge for other social media-based systems. Several real-time systems were developed to tackle this problem by focusing on filtering a specific kind of low-quality content. In this paper, we present a fine-grained real-time classification approach to identify several types of low-quality tweets (i.e., phishing, promoting, and spam tweets) written in Arabic. The system automatically extracts textual features using deep learning techniques without relying on hand-crafted features that are often time-consuming to be obtained and are tailored for a single type of low-quality content. This paper also proposes a lightweight model that utilizes a subset of the textual features to identify spamming Twitter accounts in a real-time setting. The proposed methods are evaluated on a real-world dataset (40, 000 tweets and 1, 000 accounts), showing superior performance in both models with accuracy and F1-scores of 0.98. The proposed system classifies a tweet in lessAbstract: Social networks have generated immense amounts of data that have been successfully utilized for research and business purposes. The approachability and immediacy of social media have also allowed ill-intentioned users to perform several harmful activities that include spamming, promoting, and phishing. These activities generate massive amounts of low-quality content that often exhibits duplicate, automated, inappropriate, or irrelevant content that subsequently affects users' satisfaction and imposes a significant challenge for other social media-based systems. Several real-time systems were developed to tackle this problem by focusing on filtering a specific kind of low-quality content. In this paper, we present a fine-grained real-time classification approach to identify several types of low-quality tweets (i.e., phishing, promoting, and spam tweets) written in Arabic. The system automatically extracts textual features using deep learning techniques without relying on hand-crafted features that are often time-consuming to be obtained and are tailored for a single type of low-quality content. This paper also proposes a lightweight model that utilizes a subset of the textual features to identify spamming Twitter accounts in a real-time setting. The proposed methods are evaluated on a real-world dataset (40, 000 tweets and 1, 000 accounts), showing superior performance in both models with accuracy and F1-scores of 0.98. The proposed system classifies a tweet in less than five milliseconds and an account in less than a second. Highlights: This research shows that training a deep learning model on a dataset that includes several types of low-quality tweet can be an efficient solution to filter such content on a real-time setting. Two embedding methods (word- and character-level) are compared for the task of classifying tweets in either a legitimate or low-quality class using a dataset (40, 000 tweets) collected through this project. We also show that Twitter account can be efficiently classified into spam or genuine profile using only the textual data of its recent tweets and a deep learning model. … (more)
- Is Part Of:
- Information systems. Issue 99(2021)
- Journal:
- Information systems
- Issue:
- Issue 99(2021)
- Issue Display:
- Volume 99, Issue 99 (2021)
- Year:
- 2021
- Volume:
- 99
- Issue:
- 99
- Issue Sort Value:
- 2021-0099-0099-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07
- Subjects:
- Low-quality content in social networks -- Spam accounts -- Real-time detection system -- Deep learning techniques
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2021.101740 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16907.xml