Degree Name

Doctor of Philosophy


School of Mathematics and Applied Statistics


Survey data are an important source of information for modern society. However, the complex structures of modern populations require sampling designs for surveys that are more complex than simple random sampling in order to be effective. With large national population surveys, the sample data collected via these designs typically include sample weights that allow analysis to take account of these complex population structures. As a consequence, these sample weights need to be taken into consideration when modelling the sample data, e.g. when the target of estimation is the coefficients of a regression model for the target population. In this situation, it is important to know whether these weights should be used when identifying an appropriate model specification and also whether they should be used when fitting this model to the survey data. Given the complexity of both model choice and model fitting and the limited literature on this issue, there is clearly scope for theoretical and methodological development in order to help with these decisions.

The principal aim of this thesis is to develop and evaluate strategies for population modelling using complex sample survey data. More specifically, since both linear and logistic regression analysis are very widely used statistical modelling methods, our goal is to develop procedures for analysing complex sample survey data in order to choose appropriate linear and logistic regression models based on either unweighted or weighted modelling of the survey data. In particular we develop two approaches to regression model choice and consequent regression model fit given complex survey data. These are a likelihood-based approach and a prediction-based approach. Both approaches allow us to identify a final model given two competing models suggested by model search methods based on application of different inferential paradigms. The likelihood approach is based on the non-nested test suggested by Vuong (1989), while the predictive approach uses cross-validation. The two model choice methods differ in terms of whether or not they use the sample weights. That is, we investigate four modelling strategies defined by the combination of two different approaches to model identification (likelihood-based versus cross-validation) and two paradigms for model search (unweighted versus weighted).