An MRC and adaptive positive‐unlabeled learning framework for incompletely labeled named entity recognition. Issue 11 (22nd August 2022)
- Record Type:
- Journal Article
- Title:
- An MRC and adaptive positive‐unlabeled learning framework for incompletely labeled named entity recognition. Issue 11 (22nd August 2022)
- Main Title:
- An MRC and adaptive positive‐unlabeled learning framework for incompletely labeled named entity recognition
- Authors:
- Zhang, Fu
Ma, Liangdong
Wang, Jiapeng
Cheng, Jingwei - Abstract:
- Abstract: Currently, named entity recognition (NER) is mainly evaluated on standard and well‐annotated data sets. However, the construction of a well‐annotated data set will consume a lot of manpower and time. In lots of applications of NER, data sets may contain a lot of noise, and a large part of noise comes from unlabeled entities. At present, the training process of most models treat unlabeled entities as nonentities, which causes these models to lean toward predicting most words of an input context as nonentities and greatly affects their performances. In this paper, as the first attempt, we innovatively propose an adaptive positive‐unlabeled (adaPU) learning technology, and integrate the adaPU into a machine reading comprehension (MRC) framework for NER, which can still perform well on data sets with a large proportion of unlabeled entities. In our framework, to leverage the above problem that a model may predict most words of an input context as nonentities, we propose an adaPU learning technology by adjusting a loss coefficient of positive and negative samples. Moreover, instead of just constructing a fixed query for each entity type as input to MRC, we propose a new method of dynamically constructing multiple queries for each entity type, which also brings slight performance improvement for NER. Accordingly, we explore new training and entity inference strategies for our learning framework. The experimental results show that our framework is effective on data setsAbstract: Currently, named entity recognition (NER) is mainly evaluated on standard and well‐annotated data sets. However, the construction of a well‐annotated data set will consume a lot of manpower and time. In lots of applications of NER, data sets may contain a lot of noise, and a large part of noise comes from unlabeled entities. At present, the training process of most models treat unlabeled entities as nonentities, which causes these models to lean toward predicting most words of an input context as nonentities and greatly affects their performances. In this paper, as the first attempt, we innovatively propose an adaptive positive‐unlabeled (adaPU) learning technology, and integrate the adaPU into a machine reading comprehension (MRC) framework for NER, which can still perform well on data sets with a large proportion of unlabeled entities. In our framework, to leverage the above problem that a model may predict most words of an input context as nonentities, we propose an adaPU learning technology by adjusting a loss coefficient of positive and negative samples. Moreover, instead of just constructing a fixed query for each entity type as input to MRC, we propose a new method of dynamically constructing multiple queries for each entity type, which also brings slight performance improvement for NER. Accordingly, we explore new training and entity inference strategies for our learning framework. The experimental results show that our framework is effective on data sets that contain a large number of unlabeled entities. When the proportion of unlabeled entities reaches 50%, our framework still can keep from losing effectiveness and maintain more than 80 F1‐scores on several data sets. Also, the experiments show that our framework can achieve better or competitive performance on standard data sets. The ablation experiments further fully demonstrate our MRC framework with adaPU learning and dynamic query construction method can improve the performance of NER. … (more)
- Is Part Of:
- International journal of intelligent systems. Volume 37:Issue 11(2022)
- Journal:
- International journal of intelligent systems
- Issue:
- Volume 37:Issue 11(2022)
- Issue Display:
- Volume 37, Issue 11 (2022)
- Year:
- 2022
- Volume:
- 37
- Issue:
- 11
- Issue Sort Value:
- 2022-0037-0011-0000
- Page Start:
- 9580
- Page End:
- 9597
- Publication Date:
- 2022-08-22
- Subjects:
- adaptive positive‐unlabeled learning -- machine reading comprehension -- named entity recognition -- unlabeled entities
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
006.3 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1098-111X ↗
https://www.hindawi.com/journals/ijis ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/int.23015 ↗
- Languages:
- English
- ISSNs:
- 0884-8173
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.310500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23902.xml