Year

2017

Degree Name

Doctor of Philosophy

Department

School of Mathematics and Applied Statistics

Abstract

In order to improve the overall health condition of a population, accurate estimates of health indicators are required at a fine spatial scale, such as the administrative units of a country or regions within a country. Direct estimators tend to have unacceptably high standard errors for areas with small sample sizes. Model-based indirect small area estimators borrow strength from related areas and achieve lower mean squared errors.

The thesis is concerned with multivariate small area estimation (SAE), where multiple response variables instead of a single response variable are considered simultaneously. Two general problems are considered: (1) the use of a multivariate area level model to get improved estimates for each variable by area and (2) the estimation of the cross-classification of two or more indicators by area.

The multivariate Fay-Herriot (MFH) model is the natural extension of the widely used univariate Fay-Herriot (UFH) model where two or more response variables are considered together. Both numerical and simulation studies are carried out to investigate under what conditions multivariate small area estimators perform better than separate univariate estimators. Results show that the MFH model performs better under some conditions which depend on the values of the parameters such as the random effects and the sampling errors components. For example, gains from using the MFH model rather than the separate UFH model are greater when the across-variable correlations of both sampling errors and area level random effects are high and when the ratio of variances of sampling errors and random effects is high. A parametric bootstrap approach is developed to allow estimation of mean squared errors, and confidence intervals for the gain due to multivariate modelling. The approaches are applied to a 2011/12 New Zealand Health Survey dataset. The MFH model provides some improvements over UFH model according to mean squared error estimates of the estimated health indicators by electoral district. However, wide confidence intervals for the relative efficiencies associated with multivariate modelling are seen, suggesting that it is difficult in practice to be confident about gains from multivariate approaches.

A unit level approach for producing small area estimates of cross-classified counts of two or more indicators is developed, based on a multinomial logit mixed model with category specific random effects. The application is novel because contingency tables are modelled in each small area. Other researchers have considered trinomial data (such as unemployed, employed and inactive counts), whereas we extend these multinomial methods to allow small area estimation of cross-classified counts. For example, Obesity by High Blood Pressure counts can be estimated for each area in a health survey. The new method is also different from the well-known existing Structure Preserving Estimation (SPREE) approach since SPREE combines the information of auxiliary variables from a previous census with current survey data to improve the estimators of cells totals in a multi-way contingency table. The mean squared errors are estimated using parametric bootstrap methods. Data from the New Zealand Health Survey are used to illustrate the approach.

Small area estimators of cross-classified counts are also developed based on loglinear models, which are a parsimonious special case of the multinomial logit model. A number of parsimonious log-linear models are defined and applied to the New Zealand Health Survey data. These models did not do particularly well on the 2x2 tables considered in the application, but they are more computationally scalable to three-way and higher order cross-classifications.

Overall, multivariate Fay-Herriot small area estimators are useful in specific situations which are identified in the thesis, however, it is difficult to be confident with real data whether one is in such a situation. Multivariate categorical models enable a new form of small area statistics to be calculated, namely contingency tables of survey variables by area.

Share

COinS