Improved Coding over Sets for DNA-Based Data Storage
Abstract
Error-correcting codes over sets, with applications to DNA storage, are
studied. The DNA-storage channel receives a set of sequences, and produces a
corrupted version of the set, including sequence loss, symbol substitution,
symbol insertion/deletion, and limited-magnitude errors in symbols. Various
parameter regimes are studied. New bounds on code parameters are provided,
which improve upon known bounds. New codes are constructed, at times matching
the bounds up to lower-or der terms or small constant factors.