Correcting Illumina data

Abstract

Next-generation sequencing technologies revolutionized the ways in which genetic information is obtained and have opened the door for many essential applications in biomedical sciences. Hundreds of gigabytes of data are being produced, and all applications are affected by the errors in the data. Many programs have been designed to correct these errors, most of them targeting the data produced by the dominant technology of Illumina. We present a thorough comparison of these programs. Both HiSeq and MiSeq types of Illumina data are analyzed, and correcting performance is evaluated as the gain in depth and breadth of coverage, as given by correct reads and k-mers. Time and memory requirements, scalability and parallelism are considered as well. Practical guidelines are provided for the effective use of these tools. We also evaluate the efficiency of the current state-of-the-art programs for correcting Illumina data and provide research directions for further improvement.

Authors

Molnar M; Ilie L

Journal

Briefings in Bioinformatics, Vol. 16, No. 4, pp. 588–599

Publisher

Oxford University Press (OUP)

Publication Date

July 10, 2014

DOI

10.1093/bib/bbu029

ISSN

1467-5463

Associated Experts

Lucian Ilie

Adjunct Professor, Faculty of Engineering

Visit profile

Labels