An investigation of microbial groundwater contamination seasonality and extreme weather event interruptions using "big data", time-series analyses, and unsupervised machine learning.
Journal Articles
Overview
Research
Identity
Additional Document Info
View All
Overview
abstract
Temporal studies of groundwater potability have historically focused on E. coli detection rates, with non-E. coli coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, "big data" (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of high return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: E. coli and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:E. coli ratio. Time-series decompositions revealed E. coli concentrations and the NEC:E. coli ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant E. coli detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among E. coli detection rates and the lowest among NEC detection rates and the NEC:E. coli ratio. Given the spatiotemporal consistency identified for NEC and the NEC:E. coli ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing "big" groundwater quality data.