The longest common extension problem revisited and...

The longest common extension problem revisited and applications to approximate string searching

Abstract

The Longest Common Extension (LCE) problem considers a string s and computes, for each pair (i,j), the longest substring of s that starts at both i and j. It appears as a subproblem in many fundamental string problems and can be solved by linear-time preprocessing of the string that allows (worst-case) constant-time computation for each pair. The two known approaches use powerful algorithms: either constant-time computation of the Lowest Common Ancestor in trees or constant-time computation of Range Minimum Queries in arrays. We show here that, from practical point of view, such complicated approaches are not needed. We give two very simple algorithms for this problem that require no preprocessing. The first is 5 times faster than the best previous algorithms on the average whereas the second is faster on virtually all inputs. As an application, we modify the Landau–Vishkin algorithm for approximate matching to use our simplest LCE algorithm. The obtained algorithm is 13 to 20 times faster than the original. We compare it with the more widely used Ukkonen's cutoff algorithm and show that it behaves better for a significant range of error thresholds.