Misogyny Detection in Twitter: a Multilingual and Cross-Domain Study. Issue 6 (November 2020)
- Record Type:
- Journal Article
- Title:
- Misogyny Detection in Twitter: a Multilingual and Cross-Domain Study. Issue 6 (November 2020)
- Main Title:
- Misogyny Detection in Twitter: a Multilingual and Cross-Domain Study
- Authors:
- Pamungkas, Endang Wahyu
Basile, Valerio
Patti, Viviana - Abstract:
- Highlights: We conduct a broad and in-depth study on online misogyny, a relevant and timely task given that more and more episodes of hate speech and online harassment happen in social media. An extensive review of the state of the art in misogyny detection is presented. A state-of-the-art model to detect misogyny in social media is developed, and evaluated on three different languages, English, Italian, and Spanish. We investigate the most predictive linguistic features to distinguish misogynistic content from not-misogynistic content. Relationships between misogyny and other abusive language phenomena are postulated, and empirically investigated with cross-dataset experiments. The feasibility of detecting misogyny in a multilingual environment is explored. Abstract: The freedom of expression given by social media has a dark side: the growing proliferation of abusive contents on these platforms. Misogynistic speech is a kind of abusive language, which can be simplified as hate speech targeting women, and it is becoming a more and more relevant issue in recent years. AMI IberEval 2018 and AMI EVALITA 2018 were two shared tasks which mainly focused on tackling the problem of misogyny in Twitter, in three different languages, namely English, Italian, and Spanish. In this paper, we present an in-depth study on the phenomena of misogyny in those three languages, by focusing on three main objectives. Firstly, we investigate the most important features to detect misogyny and theHighlights: We conduct a broad and in-depth study on online misogyny, a relevant and timely task given that more and more episodes of hate speech and online harassment happen in social media. An extensive review of the state of the art in misogyny detection is presented. A state-of-the-art model to detect misogyny in social media is developed, and evaluated on three different languages, English, Italian, and Spanish. We investigate the most predictive linguistic features to distinguish misogynistic content from not-misogynistic content. Relationships between misogyny and other abusive language phenomena are postulated, and empirically investigated with cross-dataset experiments. The feasibility of detecting misogyny in a multilingual environment is explored. Abstract: The freedom of expression given by social media has a dark side: the growing proliferation of abusive contents on these platforms. Misogynistic speech is a kind of abusive language, which can be simplified as hate speech targeting women, and it is becoming a more and more relevant issue in recent years. AMI IberEval 2018 and AMI EVALITA 2018 were two shared tasks which mainly focused on tackling the problem of misogyny in Twitter, in three different languages, namely English, Italian, and Spanish. In this paper, we present an in-depth study on the phenomena of misogyny in those three languages, by focusing on three main objectives. Firstly, we investigate the most important features to detect misogyny and the issues which contribute to the difficulty of misogyny detection, by proposing a novel system and conducting a broad evaluation on this task. Secondly, we study the relationship between misogyny and other abusive language phenomena, by conducting a series of cross-domain classification experiments. Finally, we explore the feasibility of detecting misogyny in a multilingual environment, by carrying out cross-lingual classification experiments. Our system succeeded to outperform all state of the art systems in all benchmark AMI datasets both subtask A and subtask B. Moreover, intriguing insights emerged from error analysis, in particular about the interaction between different but related abusive phenomena. Based on our cross-domain experiment, we conclude that misogyny is quite a specific kind of abusive language, while we experimentally found that it is different from sexism. Lastly, our cross-lingual experiments show promising results. Our proposed joint-learning architecture obtained a robust performance across languages, worth to be explored in further investigation. … (more)
- Is Part Of:
- Information processing & management. Volume 57:Issue 6(2020:Nov.)
- Journal:
- Information processing & management
- Issue:
- Volume 57:Issue 6(2020:Nov.)
- Issue Display:
- Volume 57, Issue 6 (2020)
- Year:
- 2020
- Volume:
- 57
- Issue:
- 6
- Issue Sort Value:
- 2020-0057-0006-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-11
- Subjects:
- Automatic misogyny identification -- Abusive language online -- Cross-domain classification -- Cross-lingual classification -- Social media
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2020.102360 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14754.xml