Lempel–Ziv Factorization Using Less Time & Space

abstract

For 30 years the Lempel-Ziv factorization LZx of a string x = x[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) in x. When the Internet came in, a huge need for Lempel-Ziv factorization was created. Nowadays it has become a basic efficient data transmission format on the Internet.

Traditionally the standard method for computing LZx was based on O(n)-time processing of the suffix tree STx of x. Ukkonen's algorithm constructs suffix tree online and so permits LZ to be built from subtrees of ST; this gives it an advantage, at least in terms of space, over the fast and compact version of McCreight's STCA [37] due to Kurtz [24]. In 2000 Abouelhoda, Kurtz & Ohlebusch proposed a O(n)-time Lempel-Ziv factorization algorithm based on an "enhanced" suffix array - that is, a suffix array SAx together with other supporting data structures.

In this thesis we first examine some previous algorithms for computing Lempel-Ziv factorization. We then analyze the rationale of development and introduce a collection of new algorithms for computing LZ-factorization. By theoretical proof and experimental comparison based on running time and storage usage, we show that our new algorithms appear either in their theoretical behavior or in practice or both to be superior to those previously proposed. In the last chapter the conclusion of our new algorithms are given, and some open problems are pointed out for our future research.

authors

Chen, Gang
Puglisi, Simon J
Smyth, William

status

published

publication date

June 2008

has subject area

01 Mathematical Sciences (FoR)
08 Information and Computing Sciences (FoR)
Computation Theory & Mathematics (Science Metrix)

published in

Mathematics in Computer Science Journal

Lempel–Ziv Factorization Using Less Time & Space Journal Articles

Overview

abstract

authors

status

publication date

has subject area

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue