Degree Name

Doctor of Philosophy


School of Mathematics and Applied Statistics


Probabilistic matching of records from different data sets is often used to create linked data sets for use in research in health, epidemiology, economics, demography and sociology. Clearly, this type of matching can lead to linkage errors, which in turn can lead to bias and increased variability when standard statistical estimation techniques are used with the linked data. Recently, an inferential framework for statistical modelling using probabilistically linked data has been defined, which has then been used to develop modified estimation methods for regression models based on the assumption that the correctly linked records are mutually uncorrelated. In real life, however, measurements are usually made on clusters of correlated statistical units, such as people in a family, patients in a hospital or students in a school, and when analyzing such data, linear mixed models are often used.

In this thesis we show how this inferential framework can be used to develop unbiased regression parameter estimates when fitting a linear mixed model to probabilistically linked data. Furthermore, since estimation of variance components is also an important objective when fitting a mixed model, we develop appropriate modifications to standard methods of variance components estimation in order to account for linkage error. In particular, we focus on three widely used methods of variance components estimation: analysis of variance (ANOVA), maximum likelihood (ML) and restricted maximum likelihood (REML). A simulation study investigates the bias and variability of parameter estimates obtained by methods developed in this work. Simulation results indicate that all methods developed here perform reasonably well.

An application to longitudinal modeling is further investigated. In this situation, we focus on fitting linear mixed models to linked longitudinal registers. That is, more than two registers are linked and linkage errors occur across the entire registers. The results from a simulation study illustrate the performance of this approach, and show that although there is improved efficiency compared to the naive method which ignores the linkage errors, there are some issues that still need further investigation and improvement.