# [seqfan] Conjecture on required alphabet size to exhibit all correlation classes of pairs of words

Paul C. Leopardi paul.leopardi at iinet.net.au
Wed Jan 14 23:42:22 CET 2009

```Dear Seqfans,
Last month I presented a talk at the 4th International Conference on
Combinatorial Mathematics and Combinatorial Computing in Auckland New
Zealand, entitled "Accurate computation of the variance of the number of
missing words in a random string"
http://wwwmaths.anu.edu.au/~leopardi/4ICC-2008-Leopardi-words-strings-talk.pdf

I've also added two new sequences to the OEIS:
http://www.research.att.com/~njas/sequences/A152139
http://www.research.att.com/~njas/sequences/A152959

A correlation class for a pair of different words is defined as an equivalence
class of 2*2 correlation matrices, M=[XX XY; YX YY], where X and Y are
different words of length m in an alphabet of size q, and the correlations
XX, XY, YX, YY are binary words of length m as defined by Guibas and Odlyzko
[2, Section 1].

The equivalence relation is given by equivalence under transpose of the matrix
and equivalence under exchange of X and Y. Rahmann and Rivals [3, Section
3.3] call the equivalence classes "types of correlation matrices".

Let b(m,q) be the number of correlation classes of pairs of different q-ary
words of length m. Then A152139(m*(m-1)+q) = b(m,q), for q from 1 to 2*m; and
A152959(m)=b(m,4), the number of correlation classes of pairs of different
words of length m in an alphabet of size 4. In other words, for m > 1,
A152959(m) = A152139(m*(m-1)+4).

Trivially, b(m,q) = b(m,2*m) for q > 2*m. Given the first terms of A152139, an
obvious conjecture is that b(m,q) = b(m,4) for all q > 4.

My strategy for proving this conjecture has been to use an inductive proof
based on the common length m of two words X and Y in an alphabet of size q. I
want to show that there exist two words X' and Y' of length m in an alphabet
of size 4 which give the same correlation class as X and Y. So far, for some
special cases of the correlation classes of X and Y I can either directly
construct X' and Y' (base cases) or I can construct X' and Y' assuming that
the conjecture is true for certain subwords of X and Y (inductive steps). The
proofs lean heavily on the Guibas and Odlyzko  result that an
autocorrelation of a word X in an alphabet of size q is also an
autocorrelation of a word X' in a binary alphabet. I haven't yet addressed
all possible cases, so the proof is not yet complete. Is there a better
strategy?

Best regards, Paul

 Leo J. Guibas and Andrew M. Odlyzko, Periods in Strings, Journal of
Combinatorial Theory Series A, 30 (1981), 19–43.

 Leo J. Guibas and Andrew M. Odlyzko, String overlaps, pattern matching,
and nontransitive games, Journal of Combinatorial Theory Series A, 30 (1981),
183-208.

 Sven Rahmann and Eric Rivals, On the distribution of the number of missing
words in random texts, Combinatorics, Probability and Computing (2003) 12,
73-87.

PS. MathIsFun (for Doron Zeilberger, at least)

```