Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. (September 2022)
- Record Type:
- Journal Article
- Title:
- Hate speech and offensive language detection in Dravidian languages using deep ensemble framework. (September 2022)
- Main Title:
- Hate speech and offensive language detection in Dravidian languages using deep ensemble framework
- Authors:
- Roy, Pradeep Kumar
Bhawal, Snehaan
Subalalitha, Chinnaudayar Navaneethakrishnan - Abstract:
- Abstract: Social networking platforms gained widespread popularity and are used for various activities like: promoting products, sharing news, achievements and many more. On the other hand, it is also used for spreading rumors, bullying people, and abusing certain groups of people with hateful words. The hate and offensive posts must be detected and removed as early as possible from the social platforms because such posts are spread very quickly and tend to have a lot of negative impacts on human beings. In the last few years, offensive content and hate speech detection has become popular topic of research. Detecting hate speech on social platforms has many challenges, one of them being the use of code-mixed language. Majority of the social media users usually post their messages in code-mixed languages such as Hindi–English, Tamil–English, Malayalam–English, Telugu–English and others. In this exhaustive study, we explore and compare the use of various machine learning and deep learning approaches. An ensemble model by combining the outcomes of transformer and deep learning-based models is suggested to detect hate speech and offensive language on social networking platforms. The experimental outcomes of the proposed weighted ensemble framework outperformed state-of-the-art models by achieving 0.802 and 0.933 weighted F1-score for Malayalam and Tamil code-mixed datasets. Highlights: Proposed a weighted ensemble framework for hate and offensive code-mixed posts identificationAbstract: Social networking platforms gained widespread popularity and are used for various activities like: promoting products, sharing news, achievements and many more. On the other hand, it is also used for spreading rumors, bullying people, and abusing certain groups of people with hateful words. The hate and offensive posts must be detected and removed as early as possible from the social platforms because such posts are spread very quickly and tend to have a lot of negative impacts on human beings. In the last few years, offensive content and hate speech detection has become popular topic of research. Detecting hate speech on social platforms has many challenges, one of them being the use of code-mixed language. Majority of the social media users usually post their messages in code-mixed languages such as Hindi–English, Tamil–English, Malayalam–English, Telugu–English and others. In this exhaustive study, we explore and compare the use of various machine learning and deep learning approaches. An ensemble model by combining the outcomes of transformer and deep learning-based models is suggested to detect hate speech and offensive language on social networking platforms. The experimental outcomes of the proposed weighted ensemble framework outperformed state-of-the-art models by achieving 0.802 and 0.933 weighted F1-score for Malayalam and Tamil code-mixed datasets. Highlights: Proposed a weighted ensemble framework for hate and offensive code-mixed posts identification on social platforms. Two code-mixed datasets, namely Tamil and Malayalam, are used in this research. The proposed model utilized the outcomes of deep learning and transformer-based models. Transformer based models like m-BERT, distilBERT, xlm-RoBERTa performed better than the ML and DL based models. … (more)
- Is Part Of:
- Computer speech & language. Volume 75(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 75(2022)
- Issue Display:
- Volume 75, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 75
- Issue:
- 2022
- Issue Sort Value:
- 2022-0075-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Dravidian language -- Hate speech -- Offensive language -- Transfer learning -- BERT -- Deep learning -- Low-resource
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101386 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21383.xml