A practical approach to making use of uncertain species presence-only data in ecology: Reclassification, regularization methods and observer bias
Various statistical models and software platforms aim to produce species distribution models to better predict where species occur as a function of the environment. However, there are many practical challenges that arise with observations coming from opportunistic surveys. Such data may be of low quality with respect to accuracy and may also exhibit sampling bias. Here, we explore three main challenges. First, species identification can be misleading with the changes in taxonomy where the identification of species has changed for some genus, rendering older records confounded with respect to species identity. Second, the observers' sampled pattern may not reflect the true species distribution as some observers may favor some areas where the species is found. Furthermore, ecological knowledge of environmental drivers of a species distribution may be limited, which presents challenges in selecting appropriate covariates to include in species distribution models. In this paper, we extend two algorithms we recently developed which make use of misidentified observations in order to predict species distributions using spatial point processes. In particular, these algorithms incorporate sampling bias correction and address potential overfitting of the model via lasso-type penalties. We compare the performance of these algorithms to models which do not make use of the confounded species data, and explore the effects of the lasso penalty and bias correction on model performance. We apply the best performing methods to a real dataset of eastern Australian frogs for which taxonomy recently changed. Including confounded observations in the models is particularly relevant for informing management decisions regarding endangered species and species in remote areas.
Open Access Status
This publication is not available as open access