# [seqfan] Conjecture on required alphabet size to exhibit all correlation classes of pairs of words

Paul C. Leopardi paul.leopardi at iinet.net.au
Wed Jan 14 23:42:22 CET 2009

```Dear Seqfans,
Last month I presented a talk at the 4th International Conference on
Combinatorial Mathematics and Combinatorial Computing in Auckland New
Zealand, entitled "Accurate computation of the variance of the number of
missing words in a random string"
http://wwwmaths.anu.edu.au/~leopardi/4ICC-2008-Leopardi-words-strings-talk.pdf

I've also added two new sequences to the OEIS:
http://www.research.att.com/~njas/sequences/A152139
http://www.research.att.com/~njas/sequences/A152959

A correlation class for a pair of different words is defined as an equivalence
class of 2*2 correlation matrices, M=[XX XY; YX YY], where X and Y are
different words of length m in an alphabet of size q, and the correlations
XX, XY, YX, YY are binary words of length m as defined by Guibas and Odlyzko
[2, Section 1].

The equivalence relation is given by equivalence under transpose of the matrix
and equivalence under exchange of X and Y. Rahmann and Rivals [3, Section
3.3] call the equivalence classes "types of correlation matrices".

Let b(m,q) be the number of correlation classes of pairs of different q-ary
words of length m. Then A152139(m*(m-1)+q) = b(m,q), for q from 1 to 2*m; and
A152959(m)=b(m,4), the number of correlation classes of pairs of different
words of length m in an alphabet of size 4. In other words, for m > 1,
A152959(m) = A152139(m*(m-1)+4).

Trivially, b(m,q) = b(m,2*m) for q > 2*m. Given the first terms of A152139, an
obvious conjecture is that b(m,q) = b(m,4) for all q > 4.

My strategy for proving this conjecture has been to use an inductive proof
based on the common length m of two words X and Y in an alphabet of size q. I
want to show that there exist two words X' and Y' of length m in an alphabet
of size 4 which give the same correlation class as X and Y. So far, for some
special cases of the correlation classes of X and Y I can either directly
construct X' and Y' (base cases) or I can construct X' and Y' assuming that
the conjecture is true for certain subwords of X and Y (inductive steps). The
proofs lean heavily on the Guibas and Odlyzko [1] result that an
autocorrelation of a word X in an alphabet of size q is also an
autocorrelation of a word X' in a binary alphabet. I haven't yet addressed
all possible cases, so the proof is not yet complete. Is there a better
strategy?

Best regards, Paul

[1] Leo J. Guibas and Andrew M. Odlyzko, Periods in Strings, Journal of
Combinatorial Theory Series A, 30 (1981), 19–43.

[2] Leo J. Guibas and Andrew M. Odlyzko, String overlaps, pattern matching,
and nontransitive games, Journal of Combinatorial Theory Series A, 30 (1981),
183-208.

[3] Sven Rahmann and Eric Rivals, On the distribution of the number of missing
words in random texts, Combinatorics, Probability and Computing (2003) 12,
73-87.

PS. MathIsFun (for Doron Zeilberger, at least)

```