simple statistics on binary words

Emeric Deutsch deutsch at duke.poly.edu
Sun May 7 17:23:04 CEST 2006


Dear seqfans,

I'd like to find out whether you know something about the
enumeration of binary sequences of length n containing k
subsequences of a given kind? Or any literature on it.

The only examples I have found in OEIS are about the
subsequences 01 (A034867) and 00 (A076791), both contributed
by Roger Cuculiere.

Recently, I have contributed A118390 (for 000), A118424
(for 010), A118429 (010), A118869 (0101), A118884 (0011),
A118890 (0110), A118897 (0000).

I have derived the generating functions via the so-called
symbolic method (a la Flajolet). For the simplest case of
"number of 0's" it goes like: 
every binary sequence is of the form
(1)                        empty or 0b or 1c,
where b and c are binary sequences (possibly empty). Then
the g.f. G=G(t,z) [z marks size and t marks 0's] satisfies:
                        G = 1 + tzG + zG.
However, for most of the other cases, my derivation,
basically along the same lines, is extremely cumbersome.
For example, for the number of 0101's in binary sequences
of length n, instead of the decomposition (1) I have written
(2)    empty or 0 or 1 or 00 or 01 or 10 or 11 or
000b or 001c or 010d or 011e or 100f or 101g or 110h or 111j,
where b,c,...,j are binary sequences (possibly empty).
I have introduced auxiliary variables:
      x if sequence starts with 1 but not 101;
      y if sequence starts with 01;
      u if sequence starts with 101.
For the g.f. H=H(t,x,y,u,z) one finds
(3)  H = 1 + (1+x)z + (1+y+2x)z^2 + (x+y)z^3 G +
      (1+2x)z^3 B + (1+u) z^3 C + yz^3 D,
where G=H(t,1,1,1,z), B=H(t,1,1,t,z), C=H(t,1,t,1,z), and
D = H(t,t,1,t^2,z).
All we need is G. Letting x=y=u=1, then x=y=1, u=t, then
x=1,y=t, u=1, and finally x=t,y=1,u=t^2, we find four
equations with unknowns G, B, C, and D, 
leading to
             G = [1+(1-t)z^2]/[1-2z+(1-t)z^2*(1-z)^2].

Do you know of or do you see a simpler derivation?

Also the 0110's are equidistributed with the 0010's and the
0001's with the 0011's (all I have so far is the g.f. proof).

The g.f.'s for the cases 01, 001, 0001 follow the same
pattern. As a matter of fact, for these cases the derivation
of the g.f. G(t,z) is easy: for 001, from (1) we obtain
G(t,z)=1+zG(t,z)+zG(t,z) but it is wrong because it takes
care only of those 001's that occur entirely within b and c.
The 001's that are at the very beginning (occurring in 001d)
are counted but not marked by t. Consequently, we subtract
z^3*G and add tz^3*G:
 	G = 1+2zG+z^3*G-t*z^3G.
Sorry, for going into so much detail.

I'd appreciate any input, including existing literature.

Thanks,
Emeric





More information about the SeqFan mailing list