abstract
- Long read sequencing technologies provide an efficient approach to generating highly contiguous and informative assemblies. However, higher relative error rates can introduce frameshifts and premature stop codons that pseudogenize genes, hindering downstream analyses. We developed a software tool that detects gene-fragmenting errors in draft assemblies of small genomes through comparison with a curated set of reference genome sequences and raw read information. In our presented example, detected errors represent less than 0.05% of the genome, but when corrected reduced the rate of pseudogenes from 23.3 to 5.6% in example long read assemblies, comparable to the rate of pseudogenes in short read assemblies. We demonstrate that this software can detect assembly errors in long read assemblies generated from small genomes and correct them to de-fragment genes.