Centre for Statistical & Survey Methodology Working Paper Series

Publication Date



Remote sensing technology for the study of Earth and its environment has led to “Big Data” that, paradoxically, have global extent but may be spatially sparse. Furthermore, the variability in the measurement error and the latent process error may not fit conveniently into the Gaussian linear paradigm. In this paper, we consider the problem of selecting a predictor from a finite collection of spatial predictors of a spatial random process defined on D, a subset of d-dimensional Euclidean space. Critically, we make no statistical distributional assumptions other than additive measurement error. In this nonparametric setting, one could use a criterion based on a validation dataset to select a spatial predictor for all of D. Instead, we propose local criteria based on validation data to select a predictor at each spatial location in D; the result is a hybrid combination of the spatial predictors, which we call a locally selected predictor (LSP). We consider selection from a collection of some of the classical and more recently proposed spatial predictors currently available. In a simulation study, the relative performances of various LSPs, as well as the performance of each of the individual spatial predictors in the collection, are assessed. “Big Data” are always challenging, and here we apply LSP to a very large global spatial dataset of atmospheric CO2 measurements.