#### Year

2017

#### Degree Name

Doctor of Philosophy

#### Department

School of Mathematics and Applied Statistics

#### Recommended Citation

Dawber, James, Advances in M-quantile estimation, Doctor of Philosophy thesis, School of Mathematics and Applied Statistics, University of Wollongong, 2017. https://ro.uow.edu.au/theses1/188

#### Abstract

M-quantile estimators are a generalised form of quantile-like M-estimators introduced by Breckling and Chambers (1988). Quantiles are a type of M-quantile based on the least absolute deviation, and the lesser known expectiles are based on least squares. So just as the median and mean are types of M-estimators, the quantile and the expectile are types of M-quantile estimators. Another type of M-quantile is based on the Huber estimator which utilises a tuning constant that adjusts the robustness of the estimator in the presence of outliers. The tuning constant provides an intermediary estimator between the quantile and the expectile. With this robustness property, the mild distributional assumptions of M-estimation, and the quantile-like framework; altogether it makes these Huber M-quantile estimators very versatile.

Huber M-quantiles are not scale-equivariant, hence a nuisance scale parameter is required. Different estimates of this scale parameter can lead to substantial differences to the M-quantile estimates hence it is important to investigate the role and cause of these differences. Four scale estimators were investigated, including the most commonly used M-quantile scale estimator, a `naive' median absolute deviation (MAD), which was found to be erroneously generalised to M-quantiles. A second proposed scale estimator using maximum likelihood was shown to be nonrobust and unsuitable for general M-quantile estimation. Two scale estimators were found to be more suitable; the `corrected' MAD and a new estimator which is proposed based on the method of moments (MM). Each of these methods was shown to perform better than the naive MAD estimator and were comparatively similar to each other. Furthermore, it was highlighted that the corrected MAD estimator was unaffected by changes to the tuning constant which is useful. The MM scale estimator provides an appropriate alternative.

Although M-quantile estimation had already been extended to binary data, there had yet to be a further extension to M-quantiles for categorical data. A method is presented which enables this application to categorical data. Instead of generalising the pre-existing binary M-quantile estimation method to categorical data, first a simpler definition of binary M-quantiles is proposed. This results in a simple relationship between the probability and the M-quantile of binary data, and the estimates are comparable in performance to the pre-existing estimates. The main advantage of the proposed method is that it can be easily generalised to categorical data. Estimates of the categorical M-quantile can be made through estimates of categorical probabilities through a multinomial logistic model. This categorical M-quantile method was shown to perform well in small area estimation with contaminated data, as well as computationally efficient relative to the other categorical methods in small area estimation.

In order to widen the applications of M-quantiles some new methods are proposed utilising M-quantile q-scores. These q-scores provide ordered indices corresponding to where observations lie on the conditional distribution, and are fundamental to the use of M-quantiles in small area estimation. It is shown that the q-scores are actually values from a distribution function related to the data distribution and the influence function. Through an understanding of this relationship an inverse M-quantile function can be derived which has useful properties for use in model diagnostics. Methods which utilise these q-scores and this inverse M-quantile function are proposed for assessing normality of regression residuals, identifying distributional characteristics of the residual distribution, variable selection, and calculating an optimal tuning constant with contaminated data. Following these diagnostic tools some further diagnostic plots are shown to help verify when M-quantile regression estimates are appropriately fitted in practice.

The methods in this thesis advance M-quantile estimation methods and enhances the potential to be used in practice more widely.