Home
Scholarly Works
The Entropy Rate of Some Pólya String Models
Journal article

The Entropy Rate of Some Pólya String Models

Abstract

We study random string-duplication systems, which we call Pólya string models. These are motivated by a class of mutations that are common in most organisms and lead to an abundance of repeated sequences in their genomes. Unlike previous works that study the combinatorial capacity of string-duplication systems, or in a probabilistic setting, various string statistics, this work provides the exact entropy rate or bounds on it, for several probabilistic models. The entropy rate determines the compressibility of the resulting sequences, as well as quantifying the amount of sequence diversity that these mutations can create. In particular, we study the entropy rate of noisy string-duplication systems, including the tandem-duplication, end-duplication, and interspersed-duplication systems, where in all cases we study duplication of length 1 only. Interesting connections are drawn between some systems and the signature of random permutations, as well as to the beta distribution common in population genetics.

Authors

Elishco O; Hassanzadeh FF; Schwartz M; Bruck J

Journal

IEEE Transactions on Information Theory, Vol. 65, No. 12, pp. 8180–8193

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

December 1, 2019

DOI

10.1109/tit.2019.2936556

ISSN

0018-9448

Contact the Experts team