Centre for Statistical & Survey Methodology Working Paper Series

Publication Date



Spatial microsimulation models are increasingly being used to create realistic microdata for geographical areas, to enable statistical modelling of health, social and economic variables in a wide variety of application areas. The models combine sample records with benchmark data for pre-defined geographic areas, typically by sampling, or re-weighting sample records to fit a set of constraints for each area. The choice of constraints is a key factor in producing microdata that reflect the population structure.

This paper introduces the use of within-area homogeneity for selecting categorical constraint variables for spatial microsimulation. The d-statistic is a measure of within-area homogeneity, that is equivalent to intra-area correlation for areas with equal population. It can be used to identify the spatial autocorrelation exhibited by the categories of constraint variables, or combinations of categories, an important feature to reproduce when modelling local variation in a variable. It may be used to assess the statistical significance of the within-area homogeneity for a given set of categories and can assist in validating spatial microsimulation models.