An important decision that has to be made in developing the design of a cluster or multi-stage sampling scheme is the number of units to select at each stage of selection. For a two-stage design we need to decide the number of units to select from each Primary Sampling Unit (PSU) in the sample. A common approach is to estimate the costs and the variance components associated with each stage of selection and determine an optimal design. This is usually done for estimates of the means or totals of one or a small number of variables. In practice the measure of intra-cluster homogeneity, which is the ratio of the variance components, needs to be estimated from a pilot study or historical data. There may be considerable uncertainty about the intra-cluster correlation. The parameter can be close to zero and the estimate may even not differ significantly from zero, however a design based on zero intra-cluster correlation would be highly clustered and sensitive to any failure of this assumption. This paper considers the effect of uncertainty about the intra-cluster correlation and other relevant population parameters on sample design. We develop an approach to assess this uncertainty using a Bayesian bootstrap method.
History
Citation
This conference paper was originally published as Steel, D and Clark, R, Accounting for the uncertainty of information on clustering in the design of a clustered sample, Survey Research Methodology Conference, Taiwan, 2006.