Imputation of household survey data using linear mixed models
Mixed models are regularly used in the analysis of clustered data, but are only recently being used for imputation of missing data. In household surveys where multiple people are selected from each household, imputation of missing values should preserve the structure pertaining to people within households and should not artificially change the apparent intracluster correlation (ICC). This paper focuses on the use of multilevel models for imputation of missing data in household surveys. In particular, the performance of a best linear unbiased predictor for both stochastic and deterministic imputation using a linear mixed model is compared to imputation based on a single level linear model, both with and without information about household respondents. In this paper an evaluation is carried out in the context of imputing hourly wage rate in the Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated under various assumptions about the missingness mechanism for persons and households, and with low, moderate and high intra-household correlation to assess the benefits of the multilevel imputation model under different conditions. The mixed model and single level model with information about the household respondent lead to clear improvements when the ICC is moderate or high, and when there is informative missingness.