An innovative hybrid approach for extracting named entities from unstructured text data. (25th April 2019)
- Record Type:
- Journal Article
- Title:
- An innovative hybrid approach for extracting named entities from unstructured text data. (25th April 2019)
- Main Title:
- An innovative hybrid approach for extracting named entities from unstructured text data
- Authors:
- Thomas, Anu
Sangeetha, S. - Abstract:
- Abstract: Named entity recognition (NER) is the core part of information extraction that facilitates the automatic detection and classification of entities in natural language text into predefined categories, such as the names of persons, organizations, locations, and so on. The output of the NER task is crucial for many applications, including relation extraction, textual entailment, machine translation, information retrieval, etc. Literature shows that machine learning and deep learning approaches are the most widely used techniques for NER. However, for entity extraction, the abovementioned approaches demand the availability of a domain‐specific annotated data set. Our goal is to develop a hybrid NER system composed of rule‐based deep learning as well as clustering‐based approaches, which facilitates the extraction of generic entities (such as person, location, and organization) out of natural language texts of domains that lack generic named entities labeled domain data sets. The proposed approach takes the advantages of both deep learning and clustering approaches but separately, in combination with a knowledge‐based approach by using a postprocessing module. We evaluated the proposed methodology on court cases (judgments) as a use case since it contains generic named entities of different forms that are poorly or not present in open‐source NER data sets. We also evaluated our hybrid models on two benchmark data sets, namely, Computational Natural Language LearningAbstract: Named entity recognition (NER) is the core part of information extraction that facilitates the automatic detection and classification of entities in natural language text into predefined categories, such as the names of persons, organizations, locations, and so on. The output of the NER task is crucial for many applications, including relation extraction, textual entailment, machine translation, information retrieval, etc. Literature shows that machine learning and deep learning approaches are the most widely used techniques for NER. However, for entity extraction, the abovementioned approaches demand the availability of a domain‐specific annotated data set. Our goal is to develop a hybrid NER system composed of rule‐based deep learning as well as clustering‐based approaches, which facilitates the extraction of generic entities (such as person, location, and organization) out of natural language texts of domains that lack generic named entities labeled domain data sets. The proposed approach takes the advantages of both deep learning and clustering approaches but separately, in combination with a knowledge‐based approach by using a postprocessing module. We evaluated the proposed methodology on court cases (judgments) as a use case since it contains generic named entities of different forms that are poorly or not present in open‐source NER data sets. We also evaluated our hybrid models on two benchmark data sets, namely, Computational Natural Language Learning (CoNLL) 2003 and Open Knowledge Extraction (OKE) 2016. The experimental results obtained from benchmark data sets show that our hybrid models achieved substantially better performance in terms of the F‐score in comparison to other competitive systems. … (more)
- Is Part Of:
- Computational intelligence. Volume 35:Number 4(2019)
- Journal:
- Computational intelligence
- Issue:
- Volume 35:Number 4(2019)
- Issue Display:
- Volume 35, Issue 4 (2019)
- Year:
- 2019
- Volume:
- 35
- Issue:
- 4
- Issue Sort Value:
- 2019-0035-0004-0000
- Page Start:
- 799
- Page End:
- 826
- Publication Date:
- 2019-04-25
- Subjects:
- clustering approach -- deep learning–based approach -- judicial domain -- knowledge‐based approach -- named entity recognition -- phrase embeddings
Artificial intelligence -- Periodicals
Computational linguistics -- Periodicals
006.3 - Journal URLs:
- http://www.blackwellpublishing.com/journal.asp?ref=0824-7935&site=1 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/coin.12214 ↗
- Languages:
- English
- ISSNs:
- 0824-7935
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.595000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 12061.xml