Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage

RIS ID

129526

Publication Details

Chipperfield, J., Hansen, N. & Rossiter, P. (2018). Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage. International Statistical Review, 86 (2), 219-236.

Abstract

Linking administrative, survey and census files to enhance dimensions such as time and breadth or depth of detail is now common. Because a unique person identifier is often not available, records belonging to two different units (e.g. people) may be incorrectly linked. Estimating the proportion of links that are correct, called Precision, is difficult because, even after clerical review, there will remain uncertainty about whether a link is in fact correct or incorrect. Measures of Precision are useful when deciding whether or not it is worthwhile linking two files, when comparing alternative linking strategies and as a quality measure for estimates based on the linked file. This paper proposes an estimator of Precision for a linked file that has been created by either deterministic (or rules-based) or probabilistic (where evidence for a link being a match is weighted against the evidence that it is not a match) linkage, both of which are widely used in practice. This paper shows that the proposed estimators perform well.

Please refer to publisher version or contact your library.

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1111/insr.12246