Fast Data Compression with Antidictionaries

Abstract

We consider the data compression using antidictionaries and give algorithms for faster compression and decompression. While the original method of Crochemore et al. uses finite transducers with ε-moves, we (de)compress using ε-free transducers. This is provably faster, assuming data non-negligibly compressible, but we have to consider the overhead due to building the new ma-chines. In general, they can be quadratic in size compared to the ones allowing ε-moves; we prove this bound optimal as it is reached for de Bruijn words. However, in practice, the size of the ε-free machines turns out to be close to the size of the ones allowing ε-moves and therefore we can achieve significantly faster (de)compression. We show our results for the files in Calgary corpus.