WORD COMPLEXITY AND REPETITIONS IN WORDS

Abstract

With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, R (w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu…u of the same subword u of w are stored as (u,n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, SUB , and Lempel-Ziv complexity, LZ . We have always R (w)≥ LZ (w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word α being ultimately periodic is equivalent to: (i) [Formula: see text], (ii) [Formula: see text], and (iii) [Formula: see text]. De Bruijn words, well known for their high subword complexity, are shown to have almost highest repetition complexity; the precise complexity remains open. R (w) can be computed in time [Formula: see text] and it is open, and probably very difficult, to find fast algorithms.