A Bayesian Approach to Graphical Record Linkage and Deduplication. Issue 516 (1st October 2016)
- Record Type:
- Journal Article
- Title:
- A Bayesian Approach to Graphical Record Linkage and Deduplication. Issue 516 (1st October 2016)
- Main Title:
- A Bayesian Approach to Graphical Record Linkage and Deduplication
- Authors:
- Steorts, Rebecca C.
Hall, Rob
Fienberg, Stephen E. - Abstract:
- ABSTRACT: We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate transitive linkage probabilities across records (and represent this visually), and propagate the uncertainty of record linkage into later analyses. Our method makes it particularly easy to integrate record linkage with post-processing procedures such as logistic regression, capture–recapture, etc. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously record linkage approaches, despite the high-dimensional parameter space. We illustrate our method using longitudinal data from the National Long Term Care Survey and with data from the Italian Survey on Household and Wealth, where we assess the accuracy of our method and show it to be better in terms of error rates and empirical scalability than other approaches in the literature. Supplementary materials for this article are available online.
- Is Part Of:
- Journal of the American Statistical Association. Volume 111:Issue 516(2016)
- Journal:
- Journal of the American Statistical Association
- Issue:
- Volume 111:Issue 516(2016)
- Issue Display:
- Volume 111, Issue 516 (2016)
- Year:
- 2016
- Volume:
- 111
- Issue:
- 516
- Issue Sort Value:
- 2016-0111-0516-0000
- Page Start:
- 1660
- Page End:
- 1672
- Publication Date:
- 2016-10-01
- Subjects:
- Bayesian methods -- Blocking -- Clustering -- Entity resolution -- Hybrid Markov chain Monte Carlo -- Linkage structure
Statistics -- Periodicals
Statistics -- Periodicals
Statistiques -- Périodiques
États-Unis -- Statistiques -- Périodiques
519.5 - Journal URLs:
- http://www.jstor.org/journals/01621459.html ↗
http://www.ingentaconnect.com/content/asa/jasa ↗
http://www.tandfonline.com/loi/uasa20 ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/01621459.2015.1105807 ↗
- Languages:
- English
- ISSNs:
- 0162-1459
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4694.000000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6277.xml