Data-driven estimation of COVID-19 community prevalence through wastewater-based epidemiology

Publication Name

Science of the Total Environment


Wastewater-based epidemiology (WBE) has been regarded as a potential tool for the prevalence estimation of coronavirus disease 2019 (COVID-19) in the community. However, the application of the conventional back-estimation approach is currently limited due to the methodological challenges and various uncertainties. This study systematically performed meta-analysis for WBE datasets and investigated the use of data-driven models for the COVID-19 community prevalence in lieu of the conventional WBE back-estimation approach. Three different data-driven models, i.e. multiple linear regression (MLR), artificial neural network (ANN), and adaptive neuro fuzzy inference system (ANFIS) were applied to the multi-national WBE dataset. To evaluate the robustness of these models, predictions for sixteen scenarios with partial inputs were compared against the actual prevalence reports from clinical testing. The performance of models was further validated using unseen data (data sets not included for establishing the model) from different stages of the COVID-19 outbreak. Generally, ANN and ANFIS models showed better accuracy and robustness over MLR models. Air and wastewater temperature played a critical role in the prevalence estimation by data-driven models, especially MLR models. With unseen datasets, ANN model reasonably estimated the prevalence of COVID-19 (cumulative cases) at the initial phase and forecasted the upcoming new cases in 2–4 days at the post-peak phase of the COVID-19 outbreak. This study provided essential information about the feasibility and accuracy of data-driven estimation of COVID-19 prevalence through the WBE approach.

Open Access Status

This publication may be available as open access



Article Number


Funding Number


Funding Sponsor

Australian Research Council



Link to publisher version (DOI)