Named entity recognition using neural language model and CRF for Hindi language. (July 2022)
- Record Type:
- Journal Article
- Title:
- Named entity recognition using neural language model and CRF for Hindi language. (July 2022)
- Main Title:
- Named entity recognition using neural language model and CRF for Hindi language
- Authors:
- Sharma, Richa
Morwal, Sudha
Agarwal, Basant - Abstract:
- Highlights: A state-of-art Hindi NER system based on MuRIL language model and CRF. Computation of token vector through various combination of encoder layers of MuRIL. Classification of entities using variants of MuRIL and mBERT based models. Evaluation of models on highly diverse set of Hindi named entities. Abstract: Named Entity Recognition (NER) plays an important role in various Natural Language Processing (NLP) applications to extract the key information from a huge amount of unstructured text data. NER is a task of identifying and classifying the named entities into predefined categories for a given text. Recently, language models are highly appreciable in several NLP tasks as these state-of-the-art models result better even in resource scarcity. In this paper, we perform NER task on the Hindi language by incorporating the recently released multilingual language model MuRIL which stands for Multilingual Representation for Indian Languages. MuRIL is specially trained for 16 Indian languages. We develop a Hindi NER system using MuRIL with a conditional random field (CRF) layer and fine-tune the model on the ICON 2013 Hindi NER dataset. Further, in the proposed approach, we compute the addition of the last 4 layers representations of the MuRIL model instead of just using the last layer's representation and fine-tune the whole model. Several variants of this model are presented by applying different computations on token representations provided by different layers ofHighlights: A state-of-art Hindi NER system based on MuRIL language model and CRF. Computation of token vector through various combination of encoder layers of MuRIL. Classification of entities using variants of MuRIL and mBERT based models. Evaluation of models on highly diverse set of Hindi named entities. Abstract: Named Entity Recognition (NER) plays an important role in various Natural Language Processing (NLP) applications to extract the key information from a huge amount of unstructured text data. NER is a task of identifying and classifying the named entities into predefined categories for a given text. Recently, language models are highly appreciable in several NLP tasks as these state-of-the-art models result better even in resource scarcity. In this paper, we perform NER task on the Hindi language by incorporating the recently released multilingual language model MuRIL which stands for Multilingual Representation for Indian Languages. MuRIL is specially trained for 16 Indian languages. We develop a Hindi NER system using MuRIL with a conditional random field (CRF) layer and fine-tune the model on the ICON 2013 Hindi NER dataset. Further, in the proposed approach, we compute the addition of the last 4 layers representations of the MuRIL model instead of just using the last layer's representation and fine-tune the whole model. Several variants of this model are presented by applying different computations on token representations provided by different layers of 12-layered MuRIL architecture. The proposed model achieves state-of-the-art results as 87.89% precision, 83.74% recall and 85.77% F1-score and outperforms all other existing Hindi NER systems developed on the ICON 2013 dataset. Additionally, we develop a similar Hindi NER system by replacing the MuRIL language model with another state-of-the-art language model, called multilingual Bidirectional Encoder Representations from Transformers (mBERT) to analyze the efficiency of both language models over the Hindi NER task. … (more)
- Is Part Of:
- Computer speech & language. Volume 74(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 74(2022)
- Issue Display:
- Volume 74, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 74
- Issue:
- 2022
- Issue Sort Value:
- 2022-0074-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-07
- Subjects:
- Neural network -- Sequence labeling -- MuRIL -- Multilingual BERT -- Transfer-learning -- Language models
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101356 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21011.xml