Home
Scholarly Works
Using “big data” and non-linear machine learning...
Journal article

Using “big data” and non-linear machine learning to infer groundwater contamination mechanisms across a spatially extensive, geologically heterogeneous region

Abstract

Groundwater accounts for approximately 98% of available freshwater, with >2 billion people relying on it as a primary drinking water source. Notwithstanding its importance, specific groundwater quality parameters - namely microbial concentrations and non-Escherichia coli coliforms (NEC) - remain understudied. The current study sought to address this gap by modelling three distinct Contamination Indices (CI) corresponding to E. coli concentration, NEC concentration, and the NEC:E. coli concentration ratio. CIs were developed for south Ontario (115,693 km2) using ∼1 million samples from ∼290,000 wells collected between 2010 and 2021. To permit modelling, CIs were linked to 50 subregion-specific variables which impact groundwater quality (e.g., well depth, aquifer type, mean daily precipitation volumes); Generalized Additive Models (GAM) were subsequently developed and associated non-linear partial effects were calculated. Findings suggest NEC concentrations may appropriately indicate a source’s long-term potential for generalized contamination, as the NEC model exhibited high deviance explained (91.9%) due to significant associations (p < 0.05) with factors influencing and/or representing groundwater recharge. A daily summer rainfall “tipping point” was identified, with volumes >3 mm being associated with NEC concentration reductions (p < 0.0001), potentially due to subsoil saturation and/or aquifer contamination dilution. Regions with predominantly deep wells in bedrock aquifers were associated (p < 0.0001) with low NEC:E. coli ratios, i.e., localized contamination mechanisms (e.g., contaminant bypass or short-circuiting) likely dominate in these regions. The presumption that deeper aquifers/wells are “safer” may thus be due for reconsideration. The importance of understanding and inferring contamination mechanisms cannot be overstated, as it serves as a foundation for evidence-based source protection and testing recommendations.

Authors

Petculescu I; Majury A; Brown RS; McDermott K; Hynds P

Journal

Water Research X, Vol. 30, ,

Publisher

Elsevier

Publication Date

January 1, 2026

DOI

10.1016/j.wroa.2025.100475

ISSN

2589-9147

Labels

Sustainable Development Goals (SDG)

Contact the Experts team