Home
Scholarly Works
skDER & CiDDER: two scalable approaches for...
Journal article

skDER & CiDDER: two scalable approaches for microbial genome dereplication

Abstract

An abundance of microbial genomes have been sequenced in the past two decades. For fundamental comparative genomic investigations, where the goal is to determine the major gain and loss events shaping the pangenome of a species, it is often unnecessary and computationally onerous to include all available genomes in studies. In addition, over-representation of specific lineages due to sampling and sequencing bias can have undesired effects on evolutionary analyses. To assist users with genomic dereplication, selecting a subset of representative genomes, for downstream comparative genomic investigations, we developed skDER & CiDDER (https://github.com/raufs/skDER). skDER combines recent advances to efficiently estimate average nucleotide identity (ANI) between thousands of microbial genomes with two efficient algorithms for genomic dereplication. Further, CiDDER implements an approach whereby protein clusters are determined across all genomes and genomes are iteratively selected as representatives until a user-defined saturation of the total protein space is achieved. To support ease of use, several auxiliary functionalities are implemented within the two programs, including arguments to: (i) test the number of representative genomes resulting from a variety of clustering parameters, (ii) automate downloading of genomes belonging to a bacterial species or genus by name, (iii) cluster non-representative genomes to their closest representative genomes, and (iv) automatically filter predicted plasmids and phages prior to dereplication. We further assess the effects of filtering mobile genetic elements (MGEs) on ANI and alignment fraction (AF) estimates between pairs of genomes and find that MGEs tend to slightly deflate both metrics in one species.

Authors

Salamzade R; Kottapalli A; Kalan LR

Journal

, , ,

Publisher

Cold Spring Harbor Laboratory

Publication Date

September 29, 2023

DOI

10.1101/2023.09.27.559801

ISSN

2692-8205
View published work (Non-McMaster Users)

Contact the Experts team