A Semi-automatic and low-cost method to learn patterns for named entity recognition*. (15th June 2017)
- Record Type:
- Journal Article
- Title:
- A Semi-automatic and low-cost method to learn patterns for named entity recognition*. (15th June 2017)
- Main Title:
- A Semi-automatic and low-cost method to learn patterns for named entity recognition*
- Authors:
- MARRERO, M.
URBANO, J. - Abstract:
- Abstract: Named Entity Recognition is a basic task in Information Extraction that aims at identifying entities of interest within full text documents. The patterns used to recognize entities can be rule based, as in the popular JAPE system. However, hand-crafting effective patterns is often difficult, and yet there is little research devoted to methods capable of learning human-readable patterns, possibly with arbitrary sets of features. In this paper, we present a semi-automatic method to generate both regular expressions and a subset of the JAPE language. It does not need a corpus annotated beforehand. Instead, it employs active learning and combines clustering with an algorithm that finds alignments between symbols present in the entities discovered during the learning process. The method currently supports a fixed set of character features and an arbitrary set of token features, but it can incorporate other kinds of features as well. Through several experiments with an English corpus, we show the ability of the method to generate effective patterns at a low annotation cost, and how it can successfully help in the annotation of brand new corpora.
- Is Part Of:
- Natural language engineering. Volume 24:Part 1(2018)
- Journal:
- Natural language engineering
- Issue:
- Volume 24:Part 1(2018)
- Issue Display:
- Volume 24, Issue 1, Part 1 (2018)
- Year:
- 2018
- Volume:
- 24
- Issue:
- 1
- Part:
- 1
- Issue Sort Value:
- 2018-0024-0001-0001
- Page Start:
- 39
- Page End:
- 75
- Publication Date:
- 2017-06-15
- Subjects:
- Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35 - Journal URLs:
- http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
- DOI:
- 10.1017/S135132491700016X ↗
- Languages:
- English
- ISSNs:
- 1351-3249
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 5947.xml