Leveraging active learning to reduce human effort in the generation of ground‐truth for entity resolution. (27th December 2019)
- Record Type:
- Journal Article
- Title:
- Leveraging active learning to reduce human effort in the generation of ground‐truth for entity resolution. (27th December 2019)
- Main Title:
- Leveraging active learning to reduce human effort in the generation of ground‐truth for entity resolution
- Authors:
- Fernandes de Araújo, Diego
Santos Pires, Carlos Eduardo
Cassimiro Nascimento, Dimas - Abstract:
- Summary: Several methods of entity resolution (ER) have been developed in academia and industry over the years, with the intention to identify duplicate entities (eg, records) in datasets. To evaluate the efficacy of such methods, it is necessary to compare their results with a ground‐truth, which consists of a document containing all known duplicate record pairs in a dataset. In general, the generation of ground‐truths for real datasets is performed manually from the inspection of all combinations of pairs of records in a dataset. This is subject to error and presents quadratic complexity, with respect to the size(s) of the dataset(s), requiring a long time to be performed. In this context, some works present (semi)automatic approaches for the generation of ground‐truths for the ER task. However, such approaches are either not applicable to several domains or still present a considerable manual effort. In this work, we propose GTGenERAL, a semiautomatic approach that combines results from multiple algorithms of ER together with active learning to generate accurate ground‐truths employing reduced manual effort. Experiments using real datasets show that, with great manual effort reduction, GTGenERAL is able to generate ground‐truths close to those generated by the state‐of‐the‐art approach.
- Is Part Of:
- Computational intelligence. Volume 36:Number 2(2020)
- Journal:
- Computational intelligence
- Issue:
- Volume 36:Number 2(2020)
- Issue Display:
- Volume 36, Issue 2 (2020)
- Year:
- 2020
- Volume:
- 36
- Issue:
- 2
- Issue Sort Value:
- 2020-0036-0002-0000
- Page Start:
- 743
- Page End:
- 772
- Publication Date:
- 2019-12-27
- Subjects:
- active learning -- classification -- deduplication -- ground‐truth -- machine learning -- record linkage
Artificial intelligence -- Periodicals
Computational linguistics -- Periodicals
006.3 - Journal URLs:
- http://www.blackwellpublishing.com/journal.asp?ref=0824-7935&site=1 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1111/coin.12268 ↗
- Languages:
- English
- ISSNs:
- 0824-7935
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.595000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 13161.xml