Offensive language detection in Tamil YouTube comments by adapters and cross-domain knowledge transfer. (November 2022)

Record Type:: Journal Article
Title:: Offensive language detection in Tamil YouTube comments by adapters and cross-domain knowledge transfer. (November 2022)
Main Title:: Offensive language detection in Tamil YouTube comments by adapters and cross-domain knowledge transfer
Authors:: Subramanian, Malliga
Ponnusamy, Rahul
Benhur, Sean
Shanmugavadivel, Kogilavani
Ganesan, Adhithiya
Ravi, Deepti
Shanmugasundaram, Gowtham Krishnan
Priyadharshini, Ruba
Chakravarthi, Bharathi Raja
Abstract:: Abstract: Over the past few years, researchers have been focusing on the identification of offensive language on social networks. In places where English is not the primary language, social media users tend to post/comment using a code-mixed form of text. This poses various hitches in identifying offensive texts, and when combined with the limited resources available for languages such as Tamil, the task becomes considerably more challenging. This study undertakes multiple tests in order to detect potentially offensive texts in YouTube comments, made available through the HASOC-Offensive Language Identification track in Dravidian Code-Mix FIRE 2021. 1 To detect the offensive texts, models based on traditional machine learning techniques, namely Bernoulli Naïve Bayes, Support Vector Machine, Logistic Regression, and K-Nearest Neighbor, were created. In addition, pre-trained multilingual transformer-based natural language processing models such as mBERT, MuRIL (Base and Large), and XLM-RoBERTa (Base and Large) were also attempted. These models were used as fine-tuner and adapter transformers. In essence, adapters and fine-tuners accomplish the same goal, but adapters function by adding layers to the main pre-trained model and freezing their weights. This study shows that transformer-based models outperform machine learning approaches. Furthermore, in low-resource languages such as Tamil, adapter-based techniques surpass fine-tuned models in terms of both time and efficiency. … (more)
Is Part Of:: Computer speech & language. Volume 76(2022)
Journal:: Computer speech & language
Issue:: Volume 76(2022)
Issue Display:: Volume 76, Issue 2022 (2022)
Year:: 2022
Volume:: 76
Issue:: 2022
Issue Sort Value:: 2022-0076-2022-0000
Page Start:
Page End:
Publication Date:: 2022-11
Subjects:: Adapter -- Cross-domain analysis -- Finetuning -- HASOC -- Multilingual -- Machine learning models -- Offensive texts -- Transformer models
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2022.101404 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 21757.xml