skDER: microbial genome dereplication approaches for comparative and metagenomic applications Journal Articles uri icon

  •  
  • Overview
  •  
  • Research
  •  
  • Identity
  •  
  • Additional Document Info
  •  
  • View All
  •  

abstract

  • AbstractskDER (https://github.com/raufs/skDER) combines recent advances to efficiently estimate average nucleotide identity (ANI) between thousands of microbial genomes by skani1with two low-memory methods for genomic dereplication. The first method implements a dynamic algorithm to determine a concise set of representative genomes. This approach is well-suited for selecting reference genomes to align metagenomic reads onto for tracking strain presence across related microbiome samples. This is because fewer representative genomes should alleviate the concern that reads belonging to the same strain get falsely partitioned across closely related genomes. The other method, which uses a greedy approach, is better suited for use in comparative genomics, where users might be overwhelmed with the high number of genomes available for certain taxa and aim to reduce redundancy and, therefore, computational requirements for downstream analytics. This method selects a larger number of representative genomes to comprehensively sample the pangenome space for the taxon of interest. To further aid usage for comparative genomics studies, skDER also features an option to automatically download genomes classified as a particular species or genus in the Genome Taxonomy Database2–4and we provide precomputed representative genomes for commonly studied bacterial taxa5.

publication date

  • September 29, 2023