Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM. (January 2021)

Record Type:: Journal Article
Title:: Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM. (January 2021)
Main Title:: Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM
Authors:: AlKhwiter, Wasan
Al-Twairesh, Nora
Abstract:: Highlights: POS taggers are developed for MSA and GLF variants of the Arabic language using CRF and BiLSTM. The gold standard annotated datasets that have been constructed for POS tagging are made accessible to the research community. An exploratory analysis of the behavior of using hashtags in Arabic tweets is presented, and this can be leveraged in future studies. The POS tagger for Arabic tweets using the BiLSTM achieves the best performance. Experiments show that there is no need for a dialect specific POS tagger. Abstract: Over the past few years, Twitter has experienced massive growth and the volume of its online content has increased rapidly. This content has been a rich source for several studies that focused on natural language processing (NLP) research. However, Twitter data pose numerous challenges and obstacles to NLP tasks. For the English language, Twitter has an NLP tool that provides tweet-specific NLP tasks, which present significant opportunities for English NLP research and applications. Part-of-speech (POS) tagging for English tweets is one of the tasks that is offered and facilitated by such a tool. In contrast, only a few attempts have been made to develop POS taggers for Arabic content on Twitter. In this paper, we consider POS tagging, which is one of the NLP tasks that directly affects the performance of other subsequent text processing tasks. We introduce three manually annotated datasets for the POS tagging of Arabic tweets: the 'Mixed, ' 'MSA, ' … (more)
Is Part Of:: Computer speech & language. Volume 65(2021)
Journal:: Computer speech & language
Issue:: Volume 65(2021)
Issue Display:: Volume 65, Issue 2021 (2021)
Year:: 2021
Volume:: 65
Issue:: 2021
Issue Sort Value:: 2021-0065-2021-0000
Page Start:
Page End:
Publication Date:: 2021-01
Subjects:: Part-of-speech (POS) tagging -- Conditional random fields -- Bidirectional Long Short-Term Memory -- Arabic Tweets
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2020.101138 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 16886.xml