posted on 2025-12-12, 01:28authored byMatthew Sainsbury-Dale, Andrew Zammit-Mangion, Noel CressieNoel Cressie, Raphaël Huser
Advancements in artificial intelligence (AI) and deep learning have led to
neural networks being used to generate lightning-speed answers to complex
questions, to paint like Monet, or to write like Proust. Leveraging their
computational speed and flexibility, neural networks are also being used to
facilitate fast, likelihood-free statistical inference. However, it is not
straightforward to use neural networks with data that for various reasons are
incomplete, which precludes their use in many applications. A recently proposed
approach to remedy this issue inputs an appropriately padded data vector and a
vector that encodes the missingness pattern to a neural network. While
computationally efficient, this "masking" approach can result in statistically
inefficient inferences. Here, we propose an alternative approach that is based
on the Monte Carlo expectation-maximization (EM) algorithm. Our EM approach is
likelihood-free, substantially faster than the conventional EM algorithm as it
does not require numerical optimization at each iteration, and more
statistically efficient than the masking approach. This research represents a
prototype problem that indicates how improvements could be made in AI by
introducing Bayesian statistical thinking. We compare the two approaches to
missingness using simulated incomplete data from two models: a spatial Gaussian
process model, and a spatial Potts model. The utility of the methodology is
shown on Arctic sea-ice data and cryptocurrency data.<p></p>