On the Asymptotic Rate of Optimal Codes that Correct Tandem Duplications
for Nanopore Sequencing
Abstract
We study codes that can correct backtracking errors during nanopore
sequencing. In this channel, a sequence of length $n$ over an alphabet of size
$q$ is being read by a sliding window of length $\ell$, where from each window
we obtain only its composition. Backtracking errors cause some windows to
repeat, hence manifesting as tandem-duplication errors of length $k$ in the
$\ell$-read vector of window compositions. While existing constructions for
duplication-correcting codes can be straightforwardly adapted to this model,
even resulting in optimal codes, their asymptotic rate is hard to find. In the
regime of unbounded number of duplication errors, we either give the exact
asymptotic rate of optimal codes, or bounds on it, depending on the values of
$k$, $\ell$ and $q$. In the regime of a constant number of duplication errors,
$t$, we find the redundancy of optimal codes to be $t\log_q n+O(1)$ when
$\ell|k$, and only upper bounded by this quantity otherwise.