Privacy preserving record linkage in the presence of missing values. (November 2017)
- Record Type:
- Journal Article
- Title:
- Privacy preserving record linkage in the presence of missing values. (November 2017)
- Main Title:
- Privacy preserving record linkage in the presence of missing values
- Authors:
- Chi, Yuan
Hong, Jun
Jurek, Anna
Liu, Weiru
O'Reilly, Dermot - Abstract:
- Highlights: It is proposed that the missing value in a record is handled by utilising the values of the corresponding fields in the k-NNs of this record. The proposed method for dealing with missing values allows the use of the traditional blocking techniques to handle the scalability issue. The existing Bloom filter protocol has been adapted to address both issues of missing values and privacy preservation. Abstract: The problem of record linkage is to identify records from two datasets, which refer to the same entities (e.g. patients). A particular issue of record linkage is the presence of missing values in records, which has not been fully addressed. Another issue is how privacy and confidentiality can be preserved in the process of record linkage. In this paper, we propose an approach for privacy preserving record linkage in the presence of missing values. For any missing value in a record, our approach imputes the similarity measure between the missing value and the value of the corresponding field in any of the possible matching records from another dataset. We use the k -NNs ( k Nearest Neighbours in the same dataset) of the record with the missing value and their distances to the record for similarity imputation. For privacy preservation, our approach uses the Bloom filter protocol in the settings of both standard privacy preserving record linkage without missing values and privacy preserving record linkage with missing values. We have conducted an experimentalHighlights: It is proposed that the missing value in a record is handled by utilising the values of the corresponding fields in the k-NNs of this record. The proposed method for dealing with missing values allows the use of the traditional blocking techniques to handle the scalability issue. The existing Bloom filter protocol has been adapted to address both issues of missing values and privacy preservation. Abstract: The problem of record linkage is to identify records from two datasets, which refer to the same entities (e.g. patients). A particular issue of record linkage is the presence of missing values in records, which has not been fully addressed. Another issue is how privacy and confidentiality can be preserved in the process of record linkage. In this paper, we propose an approach for privacy preserving record linkage in the presence of missing values. For any missing value in a record, our approach imputes the similarity measure between the missing value and the value of the corresponding field in any of the possible matching records from another dataset. We use the k -NNs ( k Nearest Neighbours in the same dataset) of the record with the missing value and their distances to the record for similarity imputation. For privacy preservation, our approach uses the Bloom filter protocol in the settings of both standard privacy preserving record linkage without missing values and privacy preserving record linkage with missing values. We have conducted an experimental evaluation using three pairs of synthetic datasets with different rates of missing values. Our experimental results show the effectiveness and efficiency of our proposed approach. … (more)
- Is Part Of:
- Information systems. Volume 71(2017)
- Journal:
- Information systems
- Issue:
- Volume 71(2017)
- Issue Display:
- Volume 71, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 71
- Issue:
- 2017
- Issue Sort Value:
- 2017-0071-2017-0000
- Page Start:
- 199
- Page End:
- 210
- Publication Date:
- 2017-11
- Subjects:
- Record linkage -- Probabilistic record linkage -- Privacy preserving record linkage -- Missing values -- k-Nearest Neighbours -- Data encryption
Database management -- Periodicals
Electronic data processing -- Periodicals
Bases de données -- Gestion -- Périodiques
Informatique -- Périodiques
Database management
Electronic data processing
Periodicals
005.7 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064379 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.is.2017.07.001 ↗
- Languages:
- English
- ISSNs:
- 0306-4379
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4496.367300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11505.xml