Degree Name

Doctor of Philosophy


Center for Statistical and Survey Methodology


Sample surveys have long been used as cost-effective means for data collection. Such data are used to provide suitable statistics not only for the population targeted by the survey but also for a variety of subpopulations, often called domains or areas. Sampling designs and in particular sample sizes are chosen in practice so as to provide reliable estimates for aggregates of the small areas such as large geographical regions or broad demographic groups. Budget and other constraints usually prevent the allocation of sufficiently large samples to each of the sub-domains or small areas to provide reliable estimates using traditional techniques.

This thesis will develop approaches for sample design to support small area estimation. Sample designs for small areas can be classified into two major categories:

• when it is feasible to select sample units in every small area;

• when only a subset of small areas can be surveyed.

The first case will be represented by stratified sampling where strata are small areas. The second case will be represented by two-stage sampling where clusters are small areas and are selected either by equal probability (simple two-stage) or unequal probability sampling (general two-stage).

In each case, the aim is to find the best sample design for a combination of the anticipated mean squared errors of small area composite estimates and an overall estimator of the population mean, subject to a cost constraint.

This thesis develops sample designs which minimize or reduce this objective function, either using analytical expressions for the optimal design, asymptotic approximation to the optimal design, optimal designs within restricted families of designs (such as power allocations), numerical optimization and ad-hoc approaches.

Power allocation with the exponent chosen by numerical optimization, is found to be a nearly-optimum strategy with appealing properties when all small areas can be selected in the sample. When only a subset of small areas can be selected, a two-stage unequal probability design is found to perform well, with cluster sizes given by the classical optimal cluster size. The optimal selection probabilities are a complex function of the cluster population sizes which is derived analytically. When the only priority is small area estimation, the optimal design is to select the largest clusters with certainty, and to select none of the remaining clusters. In the case where it is feasible to select sample in every small area, analytical and approximate analytical optimal designs are developed. While optimal designs minimize an objective function, they have undesirable practical properties. Simpler designs, including the adjusted power allocation with the exponent chosen by numerical optimization, are nearly as effective.