Degree Name

Doctor of Philosophy


School of Mathematics and Applied Statistics


This thesis considers the effect of modifying the analysis zones used to analyse aggregate data for geographic zones. It uses two approaches to do this. Firstly assuming that a population model is available, the effect of misspecification of the zones used to analyse the data is investigated. Secondly, the effects of zoning on the parameter estimates are investigated empirically, by creating a zoning distribution for the parameter estimates obtained using an ecological model for statistics such as population means and regression coefficients. The zoning distribution defines the probability distribution or density function of the statistic over all possible sets of M zones that could be formed, given the constraints used in constructing the zones.

Using a combination of statistical theory and empirical investigation, the parameter estimates obtained from the ecological analysis of small area health data are investigated when different sets of geographical zones are used to aggregate the data. By aggregating the data to 8 different scales using multiple sets of zones an empirical zoning distribution is obtained for each parameter estimate at each scale of analysis. This allows the implications of using a particular set of zones to be assessed for both a continuous and a binary response variable.

The procedure for creating and analysing the distributions has some innovative aspects. To obtain detailed population information, unit record files from the Australian 2007-2008 National Health Survey (Australian Bureau of Statistics, 2009) were combined with area level constraints from the 2006 Australian Census (Australian Bureau of Statistics, 2006a) to simulate realistic individual level data using combinatorial optimisation techniques. The data were aggregated to 1000 sets of zones at eight scales of aggregation using the AZTool program. Empirical zoning distributions were created from the parameter estimates obtained from analysing the data summaries for each set of zones.

The results show that the zoning distribution exists, is approximately normal and its variance increases as the scale increases. Even with over 1000 zones the variance of the zoning distribution cannot be ignored when interpreting the results of a zone based analysis.