Skyblocking for entity resolution. (November 2019)
- Record Type:
- Journal Article
- Title:
- Skyblocking for entity resolution. (November 2019)
- Main Title:
- Skyblocking for entity resolution
- Authors:
- Shao, Jingyu
Wang, Qing
Lin, Yu - Abstract:
- Abstract: In this paper, we introduce a novel framework for entity resolution blocking, called skyblocking, which aims to learn scheme skylines. In this skyblocking framework, each blocking scheme is mapped as a point to a multi-dimensional scheme space where each blocking measure represents one dimension. A scheme skyline contains blocking schemes that are not dominated by any other blocking schemes in the scheme space. To efficiently learn scheme skylines, two challenges exist: one is the class imbalance problem and the other is the search space problem. We tackle these two challenges by developing an active sampling strategy and a scheme extension strategy. Based on these two strategies, we develop three scheme skyline learning algorithms for efficiently learning scheme skylines under a given number of blocking measures and within a label budget limit. We experimentally verify that our algorithms outperform the baseline approaches in all of the following aspects: label efficiency, blocking quality and learning efficiency, over five real-world datasets. Highlights: A novel problem of scheme skyline learning is formulated. Three algorithms identify scheme skylines efficiently under a limited label budget. The class imbalanced problem is tackled by active learning techniques.
- Is Part Of:
- Information systems. Volume 85(2019)
- Journal:
- Information systems
- Issue:
- Volume 85(2019)
- Issue Display:
- Volume 85, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 85
- Issue:
- 2019
- Issue Sort Value:
- 2019-0085-2019-0000
- Page Start:
- 30
- Page End:
- 43
- Publication Date:
- 2019-11
- Subjects:
- Entity resolution -- Blocking scheme -- Active learning -- Skyline
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2019.06.003 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11052.xml