Home
Scholarly Works
Frequency Covers for Strings
Conference

Frequency Covers for Strings

Abstract

We study a central problem of string processing: the compact representation of a string by its frequently-occurring substrings. In this paper we propose an effective, easily-computed form of quasi-periodicity in strings, the frequency cover; that is, the longest of those repeating substrings u of w, |u| > 1, that occurs the maximum number of times in w. The advantage of this generalization is that it is not only applicable to all strings but also that it is the only generalized notion of cover yet proposed, which can be computed efficiently in linear time and space. We describe a simple data structure called the repeating substring frequency array (ℛ𝒮ℱ array) for the string w, which we show can be constructed in 𝒪(n) time and 𝒪(n) space, where |w| = n. We then use ℛ𝒮ℱ to compute all the frequency covers of w in linear time and space. Our research also allows us to give an alternate algorithm to compute all non-extendible repeating substrings in w, also in 𝒪(n) time and space.

Authors

Mhaskar N; Smyth WF

Volume

163

Pagination

pp. 275-289

Publisher

SAGE Publications

Publication Date

November 3, 2018

DOI

10.3233/fi-2018-1744

Conference proceedings

Fundamenta Informaticae

Issue

3

ISSN

0169-2968
View published work (Non-McMaster Users)

Contact the Experts team