Spatial statistical inference from a decision-theoretic viewpoint with application to non-Gaussian environmental data
In this thesis, I address some challenges related to spatial-statistical inference on noisy, incomplete, non-Gaussian, and potentially large environmental spatial datasets. Established theory in spatial statistics relies on the twin pillars of Gaussian spatial processes and the ubiquitous squared-error loss function to construct models and subsequently distil them into optimal predictions and uncertainty measures (e.g., kriging and kriging variances). The appropriateness of both Gaussian spatial models and the squared-error loss function is brought into question in many environmental science problems for the following reasons. First, environmental spatial data are frequently non-Gaussian (e.g., skewed and positive). Second, from a decision-theoretic viewpoint, the squared-error loss function may be inappropriate because it implies that the consequences for under-predicting and over-predicting the process by the same amount are the same whereas, in many applications, one kind of error can carry much more serious consequences than the other. This thesis presents novel methodology to model non-Gaussian spatial data and to handle the associated decision problem of optimal spatial prediction for non-Gaussian spatial processes in the presence of an asymmetric loss structure.
On loss functions, I develop spatial-statistical inference on positive-valued spatial processes by replacing squared-error loss in the spatial-prediction problem with the family of asymmetric Cressie-Read power-divergence loss functions. I investigate the consequences of the replacement by characterising the resulting optimal spatial predictor, its properties, and associated uncertainty quantification. In addition, I develop a new method to characterise loss function asymmetry for positive-valued spatial processes; I illustrate methods for calibrating the power-divergence loss function to the decision problem at hand; and I present useful closed-form results for log-Gaussian spatial models, which are commonly used to analyse skewed, positive-valued spatial data. An application is given to a real dataset of soil zinc contamination in a floodplain of the Meuse River in the Netherlands. On modelling non-Gaussian spatial processes, I use spatial copulas for modelling flexibility. Copula-based models can capture non-Gaussian marginal behaviour as well as non-Gaussian spatial dependence structures. However, spatial copula models lack a general formulation of a hierarchical spatial-statistical modelling framework to enable noisy, incomplete, and large spatial data to be used for prediction of a latent scientific spatial process. I establish a fully Bayesian hierarchical spatial-statistical modelling framework for spatial copula models that enables inference on a latent scientific process from noisy, incomplete, non-Gaussian and large spatial data. Other technical innovations are presented, such as elliptical spatial copulas with structured covariance matrices for efficient computations with large spatial datasets, and a modified hierarchical-statistical model structure that handles the change-of-support between different spatial resolutions. These innovations are then applied to spatially predict a dataset of non-Gaussian, remotelysensed atmospheric methane concentrations over a region of coal-mining activity in the Bowen Basin in Queensland, Australia.
History
Year
2025Thesis type
- Doctoral thesis