Home
Scholarly Works
Counting Distinct Strings
Journal article

Counting Distinct Strings

Abstract

Abstract. This paper discusses how to count and generate strings that are ``distinct'' in two senses: p -distinct and b -distinct. Two strings x on alphabet A and x' on alphabet A' are said to be p -distinct iff they represent distinct ``patterns''; that is, iff there exists no one—one mapping from A to A' that transforms x into x' . Thus aab and baa are p -distinct while aab and ddc are p -equivalent. On the other hand, x and x' are said to be b -distinct iff they give rise to distinct border (failure function) arrays: thus aab with border array 010 is b -distinct from aba with border array 001 . The number of p -distinct (resp. b -distinct) strings of length n formed using exactly k different letters is the [k,n] entry in an infinite p' (resp. b' ) array. Column sums p[n] and b[n] in these arrays give the number of distinct strings of length n . We present algorithms to compute, in constant time per string, all p -distinct (resp. b -distinct) strings of length n formed using exactly k letters, and we also show how to compute all elements p'[k,n] and b'[k,n] . These ideas and results have application to the efficient generation of appropriate test data sets for many string algorithms.

Authors

Moore D; Smyth WF; Miller D

Journal

Algorithmica, Vol. 23, No. 1, pp. 1–13

Publisher

Springer Nature

Publication Date

January 1, 1999

DOI

10.1007/pl00009247

ISSN

0178-4617

Labels

Contact the Experts team