Doctor of Philosophy
School of Mathematics and Applied Statistics
Lago, Luise Patricia, Imputation of household survey data using mixed models, Doctor of Philosophy thesis, School of Mathematics and Applied Statistics, University of Wollongong, 2015. http://ro.uow.edu.au/theses/4369
Household surveys collect information about a household and data items relating to one or more people within the household. Developing an efficient strategy for dealing with missing data is essential in the current climate of falling response rates. People within households are more likely to share characteristics than a random group of people and this homogeneity can be used when forming strategies for dealing with nonresponse. Amongst single value imputation methods, linear models and donor models are commonly used, but generally ignore relationships within households. These strategies make use of auxiliary variables available for nonrespondents to replace the missing value with a single value, for example a mean or donor value. Imputation strategies for missing items at person level will be the focus of this thesis. The goal is to make use of correlation structures within households to form improved imputed values for missing data.
Imputation models are developed and assessed using the hierarchical structure of people within households. They are investigated for both continuous and binary missing response variables. Linear mixed imputation models, generalized linear mixed imputation models and donor imputation methods (random, within class and nearest neighbour) are investigated and compared to existing methods which do not exploit this hierarchical structure. The imputation methods are evaluated using data from two large-scale household surveys, the Household, Income and Labour Dynamics in Australia Survey (HILDA), and the British Household Panel Survey (BHPS), on a range of criteria relevant to household surveys.
For continuous variables a proposed household nearest neighbour method results in improved imputed values over other donor methods, and the success of the linear mixed model increases with the level of clustering. For binary variables the household nearest neighbour method and generalized linear mixed models both lead to improvements over standard donor and generalized linear methods.
The household imputation methods are most beneficial for improving predictive accuracy and reproducing within-household clustering in the imputed dataset. They are of some benefit for variance estimation but did not achieve much improvement over single-level methods for bias reduction. The level of improvement often depends on the assumed nonresponse mechanism, with the linear mixed model more beneficial than the household donor method under informative nonresponse and higher levels of clustering. Otherwise, the donor household method was generally at least as good as the multilevel model and is less complex to implement.