[seqfan] Conjecture on required alphabet size to exhibit all correlation classes of pairs of words

Wed Jan 14 23:42:22 CET 2009

Dear Seqfans,
Last month I presented a talk at the 4th International Conference on 
Combinatorial Mathematics and Combinatorial Computing in Auckland New 
Zealand, entitled "Accurate computation of the variance of the number of 
missing words in a random string"
http://wwwmaths.anu.edu.au/~leopardi/4ICC-2008-Leopardi-words-strings-talk.pdf

I've also added two new sequences to the OEIS:
http://www.research.att.com/~njas/sequences/A152139
http://www.research.att.com/~njas/sequences/A152959

A correlation class for a pair of different words is defined as an equivalence 
class of 2*2 correlation matrices, M=[XX XY; YX YY], where X and Y are 
different words of length m in an alphabet of size q, and the correlations 
XX, XY, YX, YY are binary words of length m as defined by Guibas and Odlyzko 
[2, Section 1].

The equivalence relation is given by equivalence under transpose of the matrix 
and equivalence under exchange of X and Y. Rahmann and Rivals [3, Section 
3.3] call the equivalence classes "types of correlation matrices". 

Let b(m,q) be the number of correlation classes of pairs of different q-ary 
words of length m. Then A152139(m*(m-1)+q) = b(m,q), for q from 1 to 2*m; and 
A152959(m)=b(m,4), the number of correlation classes of pairs of different 
words of length m in an alphabet of size 4. In other words, for m > 1, 
A152959(m) = A152139(m*(m-1)+4).

Trivially, b(m,q) = b(m,2*m) for q > 2*m. Given the first terms of A152139, an 
obvious conjecture is that b(m,q) = b(m,4) for all q > 4.

My strategy for proving this conjecture has been to use an inductive proof 
based on the common length m of two words X and Y in an alphabet of size q. I 
want to show that there exist two words X' and Y' of length m in an alphabet 
of size 4 which give the same correlation class as X and Y. So far, for some 
special cases of the correlation classes of X and Y I can either directly 
construct X' and Y' (base cases) or I can construct X' and Y' assuming that 
the conjecture is true for certain subwords of X and Y (inductive steps). The 
proofs lean heavily on the Guibas and Odlyzko [1] result that an 
autocorrelation of a word X in an alphabet of size q is also an 
autocorrelation of a word X' in a binary alphabet. I haven't yet addressed 
all possible cases, so the proof is not yet complete. Is there a better 
strategy?

Best regards, Paul

[1] Leo J. Guibas and Andrew M. Odlyzko, Periods in Strings, Journal of 
Combinatorial Theory Series A, 30 (1981), 19–43.

[2] Leo J. Guibas and Andrew M. Odlyzko, String overlaps, pattern matching, 
and nontransitive games, Journal of Combinatorial Theory Series A, 30 (1981), 
183-208.

[3] Sven Rahmann and Eric Rivals, On the distribution of the number of missing 
words in random texts, Combinatorics, Probability and Computing (2003) 12, 
73-87. 

PS. MathIsFun (for Doron Zeilberger, at least)