Clark, Robert Graham, Statistical learning in sample design, Centre for Statistical and Survey Methodology, University of Wollongong, Working Paper 6-12, 2012, 25.
A well-designed sampling plan can greatly enhance the information that can be produced from a survey. Once a broad sample design is identified, specific design parameters such as sample sizes and selection probabilities need to be chosen. This is typically achieved using an optimal sample design, which minimises the variance of a key statistic or statistics, expressed as a function of design parameters and population characteristics, subject to a cost constraint. In practice, only imprecise estimates of population characteristics are available, but the effects of this variability are usually ignored. A general approach to sample allocation allowing for imprecise design data is proposed and evaluated. The approach is based on the availability of two sets of design data which can act as a check on each other. One application is to stratified sampling, where estimated stratum variances may be highly variable. Pooling strata into groups may reduce this variability, at the possible cost of some inefficiency. Proportional allocation, ignoring differences between stratum variances, could also be used. The new approach enables a data- driven compromise between all three. Simulation results based on real data show useful gains in a hypothetical farm survey, business survey and household survey of a subpopulation.