An evaluation of logic regression-based biomarker discovery across multiple intergenic regions for predicting host specificity in Escherichia coli
- Additional Document Info
- View All
Several studies have demonstrated that E. coli appears to display some level of host adaptation and specificity. Recent studies in our laboratory support these findings as determined by logic regression modeling of single nucleotide polymorphisms (SNP) in intergenic regions (ITGRs). We sought to determine the degree of host-specific information encoded in various ITGRs across a library of animal E. coli isolates using both whole genome analysis and a targeted ITGR sequencing approach. Our findings demonstrated that ITGRs across the genome encode various degrees of host-specific information. Incorporating multiple ITGRs (i.e., concatenation) into logic regression model building resulted in greater host-specificity and sensitivity outcomes in biomarkers, but the overall level of polymorphism in an ITGR did not correlate with the degree of host-specificity encoded in the ITGR. This suggests that distinct SNPs in ITGRs may be more important in defining host-specificity than overall sequence variation, explaining why traditional unsupervised learning phylogenetic approaches may be less informative in terms of revealing host-specific information encoded in DNA sequence. In silico analysis of 80 candidate ITGRs from publically available E. coli genomes was performed as a tool for discovering highly host-specific ITGRs. In one ITGR (ydeR-yedS) we identified a SNP biomarker that was 98% specific for cattle and for which 92% of all E. coli isolates originating from cattle carried this unique biomarker. In the case of humans, a host-specific biomarker (98% specificity) was identified in the concatenated ITGR sequences of rcsD-ompC, ydeR-yedS, and rclR-ykgE, and for which 78% of E. coli originating from humans carried this biomarker. Interestingly, human-specific biomarkers were dominant in ITGRs regulating antibiotic resistance, whereas in cattle host-specific biomarkers were found in ITGRs involved in stress regulation. These data suggest that evolution towards host specificity may be driven by different natural selection pressures on the regulome of E. coli among different animal hosts.
has subject area