E-MEM: efficient computation of maximal exact...

E-MEM: efficient computation of maximal exact matches for very large genomes

Abstract

MOTIVATION: Alignment of similar whole genomes is often performed using anchors given by the maximal exact matches (MEMs) between their sequences. In spite of significant amount of research on this problem, the computation of MEMs for large genomes remains a challenging problem. The leading current algorithms employ full text indexes, the sparse suffix array giving the best results. Still, their memory requirements are high, the parallelization is not very efficient, and they cannot handle very large genomes. RESULTS: We present a new algorithm, efficient computation of MEMs (E-MEM) that does not use full text indexes. Our algorithm uses much less space and is highly amenable to parallelization. It can compute all MEMs of minimum length 100 between the whole human and mouse genomes on a 12 core machine in 10 min and 2 GB of memory; the required memory can be as low as 600 MB. It can run efficiently genomes of any size. Extensive testing and comparison with currently best algorithms is provided. AVAILABILITY AND IMPLEMENTATION: The source code of E-MEM is freely available at: http://www.csd.uwo.ca/∼ilie/E-MEM/ CONTACT: ilie@csd.uwo.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Authors

Khiste N; Ilie L

Journal

Bioinformatics, Vol. 31, No. 4, pp. 509–514

Publisher

Oxford University Press (OUP)

Publication Date

February 15, 2015

DOI

10.1093/bioinformatics/btu687

ISSN

1367-4803

Associated Experts

Lucian Ilie

Adjunct Professor, Faculty of Engineering

Visit profile

Labels