Home
Scholarly Works
Improved Rank-Modulation Codes for DNA Storage...
Journal article

Improved Rank-Modulation Codes for DNA Storage With Shotgun Sequencing

Abstract

A common method for reading information stored in DNA molecules is shotgun sequencing. This method outputs a histogram of the frequencies of all the molecules’ substrings of a given length $\ell $ . To protect against noisy readings, the rank-modulation scheme encodes the information in the relative ranking of the substring frequencies, instead of their absolute values. However, the best rank-modulation codes for shotgun sequencing have low rates which are asymptotically vanishing. In this paper we propose new constructions of rank-modulation codes for shotgun sequencing. The first code construction is systematic, allowing the user to arbitrarily set the frequencies of a large subset of the substrings, which the encoder then completes to a permutation that may be realized by a DNA molecule. The construction is then improved by allowing the user to set the frequencies of additional substrings, at the cost of imposing constraints on the frequencies. The resulting codes have higher, non-vanishing rates, compared with previously known codes. As an example, for histograms of substrings of length $\ell =2$ , and an alphabet of size 4 (as in DNA molecules), we are able to construct a code with rate $\approx 0.909$ , whereas previously, the best construction resulted in a code with rate $\approx 0.654$ . Additionally, the encoded information in our construction may be written to shorter DNA molecules than possible before. We also prove that the systematic codes constructed in this paper are the largest possible among all systematic codes.

Authors

Beeri N; Schwartz M

Journal

IEEE Transactions on Information Theory, Vol. 68, No. 6, pp. 3719–3730

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

June 1, 2022

DOI

10.1109/tit.2022.3152392

ISSN

0018-9448

Contact the Experts team