Repetition Complexity of Words

Abstract

With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, r(w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu... u of the same subword u of w are stored as (u, n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, sub, and Lempel-Ziv complexity, lz. We have always r(w) ≥ lz(w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word α being ultimately periodic is equivalent to: (i) sub(prefn(α)) = $$ \mathcal{O} $$ (n), (ii) lz(prefn(α)) = $$ \mathcal{O} $$ (1), and (iii) r(prefn(α)) = lgn + $$ \mathcal{O} $$ (1). De Bruijn words, well known for their high subword complexity are shown to have almost highest repetition complexity; the precise complexity remains open. r(w) can be computed in time $$ \mathcal{O} $$ (n3(logn)2) and it is open, and probably very difficult, to find very fast algorithms.