A new computationally efficient algorithm for record linkage with field dependency and missing data imputation. (January 2018)
- Record Type:
- Journal Article
- Title:
- A new computationally efficient algorithm for record linkage with field dependency and missing data imputation. (January 2018)
- Main Title:
- A new computationally efficient algorithm for record linkage with field dependency and missing data imputation
- Authors:
- Ferguson, John
Hannigan, Ailish
Stack, Austin - Abstract:
- Highlights: Record linkage algorithms aim to identify pairs of records in two or more databases, that correspond to the same individual. Here, we propose a novel record linkage algorithm that incorporates missing data and correlation between field agreement indicators. We demonstrate that our algorithm can achieve more accurate record linkage, compared to the original probabilistic linkage algorithm of Fellegi and Sunter. Our algorithm is computationally efficient, and can be used to link large databases consisting of millions of records. An R-package, corlink, has been developed to implement the new algorithm and can be downloaded from the CRAN repository. Abstract: Record linkage algorithms aim to identify pairs of records that correspond to the same individual from two or more datasets. In general, fields that are common to both datasets are compared to determine which record-pairs to link. The classic model for probabilistic linkage was proposed by Fellegi and Sunter and assumes that individual fields common to both datasets are completely observed, and that the field agreement indicators are conditionally independent within the subsets of record pairs corresponding to the same and differing individuals. Herein, we propose a novel record linkage algorithm that is independent of these two baseline assumptions. We demonstrate improved performance of the algorithm in the presence of missing data and correlation patterns between the agreement indicators. The algorithm isHighlights: Record linkage algorithms aim to identify pairs of records in two or more databases, that correspond to the same individual. Here, we propose a novel record linkage algorithm that incorporates missing data and correlation between field agreement indicators. We demonstrate that our algorithm can achieve more accurate record linkage, compared to the original probabilistic linkage algorithm of Fellegi and Sunter. Our algorithm is computationally efficient, and can be used to link large databases consisting of millions of records. An R-package, corlink, has been developed to implement the new algorithm and can be downloaded from the CRAN repository. Abstract: Record linkage algorithms aim to identify pairs of records that correspond to the same individual from two or more datasets. In general, fields that are common to both datasets are compared to determine which record-pairs to link. The classic model for probabilistic linkage was proposed by Fellegi and Sunter and assumes that individual fields common to both datasets are completely observed, and that the field agreement indicators are conditionally independent within the subsets of record pairs corresponding to the same and differing individuals. Herein, we propose a novel record linkage algorithm that is independent of these two baseline assumptions. We demonstrate improved performance of the algorithm in the presence of missing data and correlation patterns between the agreement indicators. The algorithm is computationally efficient and can be used to link large databases consisting of millions of record pairs. An R-package, corlink, has been developed to implement the new algorithm and can be downloaded from the CRAN repository. … (more)
- Is Part Of:
- International journal of medical informatics. Volume 109(2018)
- Journal:
- International journal of medical informatics
- Issue:
- Volume 109(2018)
- Issue Display:
- Volume 109, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 109
- Issue:
- 2018
- Issue Sort Value:
- 2018-0109-2018-0000
- Page Start:
- 70
- Page End:
- 75
- Publication Date:
- 2018-01
- Subjects:
- Record linkage -- Fellegi/Sunter -- Conditional independence -- EM-algorithm -- Log-linear models
Medical informatics -- Periodicals
Information science -- Periodicals
Computers -- Periodicals
Medical technology -- Periodicals
Medical Informatics -- Periodicals
Technology, Medical -- Periodicals
Computers
Information science
Medical informatics
Medical technology
Electronic journals
Periodicals
Electronic journals
610.285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/13865056 ↗
http://www.clinicalkey.com/dura/browse/journalIssue/13865056 ↗
http://www.clinicalkey.com.au/dura/browse/journalIssue/13865056 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ijmedinf.2017.10.021 ↗
- Languages:
- English
- ISSNs:
- 1386-5056
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.345250
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5509.xml