Hate speech detection on Twitter using transfer learning. (July 2022)
- Record Type:
- Journal Article
- Title:
- Hate speech detection on Twitter using transfer learning. (July 2022)
- Main Title:
- Hate speech detection on Twitter using transfer learning
- Authors:
- Ali, Raza
Farooq, Umar
Arshad, Umair
Shahzad, Waseem
Beg, Mirza Omer - Abstract:
- Highlights: The results show that using transfer learning with BERT architecture gives best results on our dataset. This study also presents an ablation study by using varying architectures of BERT. This study presents the performance of various baseline machine learning models on hate speech dataset. The efficacy of transfer learning is depicted to perform well on low resource language such as the Urdu language. Abstract: Social Media has become an ultimate driver of social change in the global society. Implications of the events, that take place in one corner of the word, reverberate across the globe in various geographies. This is so because the huge amount of data generated on these platforms, reaches the far corners of the world in the blink of an eye. Developers of these platforms are facing numerous challenges to keep cyber space as inclusive and healthy as possible. However, in recent years, the phenomena of offensive speech and hate speech have risen their ugly heads. Despite manual efforts, the scope of this problem is so immense that it cannot be tackled by using concerted teams. In fact, there is a need that an automated technique is designed that detects and removes offensive and hateful comments before the materialization of their harmful impacts. In this research work, we develop an Urdu language hate lexicon, on the basis of this lexicon we formulate annotated dataset of 10, 526 Urdu tweets. Furthermore, as baseline experiments, we use various machineHighlights: The results show that using transfer learning with BERT architecture gives best results on our dataset. This study also presents an ablation study by using varying architectures of BERT. This study presents the performance of various baseline machine learning models on hate speech dataset. The efficacy of transfer learning is depicted to perform well on low resource language such as the Urdu language. Abstract: Social Media has become an ultimate driver of social change in the global society. Implications of the events, that take place in one corner of the word, reverberate across the globe in various geographies. This is so because the huge amount of data generated on these platforms, reaches the far corners of the world in the blink of an eye. Developers of these platforms are facing numerous challenges to keep cyber space as inclusive and healthy as possible. However, in recent years, the phenomena of offensive speech and hate speech have risen their ugly heads. Despite manual efforts, the scope of this problem is so immense that it cannot be tackled by using concerted teams. In fact, there is a need that an automated technique is designed that detects and removes offensive and hateful comments before the materialization of their harmful impacts. In this research work, we develop an Urdu language hate lexicon, on the basis of this lexicon we formulate annotated dataset of 10, 526 Urdu tweets. Furthermore, as baseline experiments, we use various machine learning techniques for hate speech detection. In addition, we use transfer learning to exploit pre-trained FastText Urdu word embeddings and multi-lingual BERT embeddings for our task. Finally, we experiment with four different variants of BERT to exploit transfer learning, and we show that BERT, xlm-roberta and distil-Bert are able to achieve encouraging F1-scores of 0.68, 0.68 and 0.69 respectively, on our multi class classification task. All these models exhibited success to varying degree but outperform a number of deep learning and machine learning baseline models. … (more)
- Is Part Of:
- Computer speech & language. Volume 74(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 74(2022)
- Issue Display:
- Volume 74, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 74
- Issue:
- 2022
- Issue Sort Value:
- 2022-0074-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-07
- Subjects:
- Deep learning -- Transfer learning -- Hate speech -- Social Media -- Machine Learning
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101365 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21011.xml