abstract
- BACKGROUND: Among naturally occurring small molecules, tRNA-derived cyclodipeptides are a class that have attracted attention for their diverse and desirable biological activities. However, no tools are available to link cyclodipeptide synthases identified within prokaryotic genome sequences to their chemical products. Consequently, it is unclear how many genetically encoded cyclodipeptides represent novel products, and which producing organisms should be targeted for discovery. RESULTS: We developed a pipeline for identification and classification of cyclodipeptide biosynthetic gene clusters and prediction of aminoacyl-tRNA substrates and complete chemical structures. We leveraged this tool to conduct a global analysis of tRNA-derived cyclodipeptide biosynthesis in 93,107 prokaryotic genomes, and compared predicted cyclodipeptides to known cyclodipeptide synthase products and all known chemically characterized cyclodipeptides. By integrating predicted chemical structures and gene cluster architectures, we created a unified map of known and unknown genetically encoded cyclodipeptides. CONCLUSIONS: Our analysis suggests that sizeable regions of the chemical space encoded within sequenced prokaryotic genomes remain unexplored. Our map of the landscape of genetically encoded cyclodipeptides provides candidates for targeted discovery of novel compounds. The integration of our pipeline into a user-friendly web application provides a resource for further discovery of cyclodipeptides in newly sequenced prokaryotic genomes.