Secondary analysis of linked data
A common type of linked data analysis is regression analysis, where the underlying relationship between a response variable and a set of explanatory variables is explored by fitting a regression model to linked data. The bias correction ideas explored in Kim and Chambers were motivated by secondary analysis, in that these authors assume a model for linkage errors that only requires non-sensitive summary information about the performance of the data linkage method. This chapter takes the same approach, using linear regression analysis to illustrate the basic ideas. It discusses measurement issues that result following a linking procedure: correct and incorrect links, and non-links; and characterising errors from linkage and non-linkage. The chapter describes the models for different types of linking errors: linkage errors under binary linking; linkage errors under multi-linking; and incomplete linking. It illustrates the use of regression analysis for incomplete binary-linked data and multi-linked data.