A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection. Issue 4 (July 2021)
- Record Type:
- Journal Article
- Title:
- A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection. Issue 4 (July 2021)
- Main Title:
- A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection
- Authors:
- Pamungkas, Endang Wahyu
Basile, Valerio
Patti, Viviana - Abstract:
- Abstract: Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting. In this work, we explore hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. We experiment with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. We also evaluate the impact of additional knowledge in our experiment, by incorporating information from a multilingual lexicon of abusive words. The results show that our joint-learning models achieve the best performance on most languages. However, a simple approach that uses machine translation and a pre-trained English language model achieves a robust performance. In contrast, Multilingual BERT fails to obtain a good performance in cross-lingual hate speech detection. We also experimentally found that the external knowledge from a multilingual abusive lexicon is able to improve the models' performance, specifically in detecting the positive class. The results of ourAbstract: Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting. In this work, we explore hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. We experiment with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. We also evaluate the impact of additional knowledge in our experiment, by incorporating information from a multilingual lexicon of abusive words. The results show that our joint-learning models achieve the best performance on most languages. However, a simple approach that uses machine translation and a pre-trained English language model achieves a robust performance. In contrast, Multilingual BERT fails to obtain a good performance in cross-lingual hate speech detection. We also experimentally found that the external knowledge from a multilingual abusive lexicon is able to improve the models' performance, specifically in detecting the positive class. The results of our experimental evaluation highlight a number of challenges and issues in this particular task. One of the main challenges is related to the issue of current benchmarks for hate speech detection, in particular how bias related to the topical focus in the datasets influences the classification performance. The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task also remain an open problem. However, our experimental evaluation and our qualitative analysis show how the explicit integration of linguistic knowledge from a structured abusive language lexicon helps to alleviate this issue. Highlights: We propose a joint-learning architecture for cross-lingual hate speech detection. Our zero-shot approach transfers knowledge between different languages. We leverage one resource-rich language to inform models for lower-resource ones. We experiment on six lower-resource target languages. We experiment with three different multilingual language representation models. We investigate the impact of an external resource for knowledge transfer. We investigate the creative use of language conveying a derogatory meaning. … (more)
- Is Part Of:
- Information processing & management. Volume 58:Issue 4(2021)
- Journal:
- Information processing & management
- Issue:
- Volume 58:Issue 4(2021)
- Issue Display:
- Volume 58, Issue 4 (2021)
- Year:
- 2021
- Volume:
- 58
- Issue:
- 4
- Issue Sort Value:
- 2021-0058-0004-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07
- Subjects:
- Hate speech detection -- Cross-lingual classification -- Social media -- Transfer learning -- Zero-shot learning
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2021.102544 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16813.xml