#### Year

2010

#### Degree Name

Doctor of Philosophy

#### Department

University of Wollongong. School of Mathematics and Applied Statistics

#### Recommended Citation

Al-Zou'bi, Loai Mahmoud Awad, Adaptive inference and design for multistage surveys, Doctor of Philosophy thesis, University of Wollongong. School of Mathematics and Applied Statistics, University of Wollongong, 2010. http://ro.uow.edu.au/theses/3272

#### Abstract

Two-stage sampling usually leads to higher variances for estimators of means and regression coe_cients, because of intra-class homogeneity. This thesis will develop and evaluate adaptive strategies for designing and analyzing twostage surveys, where sample data will be used to determine the appropriate way of allowing for intraclass correlation. The approach to analysis will be based on _tting a linear regression model to estimate means and regression coe_cients. One method for allowing for clustering in _tting a linear regression model is to use a linear mixed model with two levels. If the estimated intra-class correlation is close to zero, it may be acceptable to ignore clustering and use a single level model. This thesis will evaluate an adaptive approach for estimating the variances of estimated regression coe_cients. The strategy is based on testing the null hypothesis that the random e_ect variance component is zero. If this hypothesis is not rejected the estimated variances of estimated regression coe_cients are extracted from the one-level linear model. Otherwise, the estimated variance iii is based on the linear mixed model, or, alternatively the Huber-White robust variance estimator is used. Another adaptive strategy based on assessing the estimated design e_ect due to clustering is also evaluated. This is based on testing the null hypothesis that the random e_ect variance component is zero and at the same time comparing the estimated design e_ect to a predetermined cuto_ value. If the null hypothesis is rejected and the estimated design e_ect is more than the predetermined cuto_ value the estimated variances of estimated regression coe_cients are extracted from the linear mixed model, or, alternatively the Huber-White robust variance estimator is used. Otherwise, the estimated variance is based on the one-level linear model. This approach is found to be nearly identical in practice to the adaptive approach based on just testing the null hypothesis that the random e_ect variance component is zero. This adaptive strategy for estimation will be developed based on a twolevel linear model assuming normality. It will be evaluated by simulation using normal data, with equal and unequal numbers of observations per cluster, and also using log-normal data, to assess the robustness of the approach to non-normality. The simulations indicate that extreme designs with 5 or less PSUs and many observations per cluster should be avoided. For these extreme designs, most methods perform poorly, including the adaptive methods and the linear mixed model, due to the di_culty of appropriately de_ning iv the degrees of freedom for this model. Apart from these extreme designs, the adaptive strategy is found to perform acceptably well, resulting in simpler analysis and slightly shorter con_dence intervals. The use of a pilot survey to estimate the intraclass correlation will also be considered. The pilot estimate of this parameter can be used to estimate the optimal within-PSU sample size for the main survey. The best design based on a \cost-adjusted design e_ect" and the estimated variance of the estimated regression coe_cients will be considered. An upper cuto_ should be placed on the sample size to be selected from each PSU, to allow for the possibility of an under-estimate of the intraclass correlation from the pilot data. The optimal value of this cuto_ is found to be between 10 and 50 depending on the pilot sample sizes.