[seqfan] A new sequence counting the number of distinct languages recognized by DFAs

Thomas Baruchel baruchel at gmx.com
Mon Jul 22 17:40:01 CEST 2024


My Fellow SeqFans,

most of us are aware of the recent achievement concerning BB(5), which inspired me to look for a (much easier) side project related to Theory of Computation.

I decided to compute the initial terms of the following sequence: the number of distinct languages over a binary alphabet recognized by a DFA with n states. I was pretty sure I was reinventing the wheel and I expected to quickly find this sequence in the OEIS already, but it turns out that is not the case (unless I made some mistake in my computation).

Short explanation: a DFA with a single state (n=1) can recognize five distinct languages:

   * empty language;
   * language containing only the empty word;
   * language contaning any word made with 'a's (including the empty word);
   * language contaning any word made with 'b's (including the empty word);
   * language contaiing any word made with 'a's and 'b's (including the empty word).

Of course more than 5 single-state DFAs could be devised, but different DFAs would recognize the same language (they are equivalent).

More values can be computed because all related subproblems are decidable, such as checking if two DFAs are equivalent or not. I took on this problem as a one-afternoon coding challenge; without putting much effort into optimizing the computation, and finally found:

   a(1) = 5
   a(2) = 104
   a(3) = 4170
   a(4) = 222354   (The term 222354 doesn't seem to be currently in the OEIS.)

I computed these four values with a brute-force approach, and I doubt my current program could be able to compute a(5). Right now, computing a(4) requires about one hour with my piece of code written in C.

I would appreciate it if someone could independently check my values (maybe some existing library related to DFAs could be used). I can obviously share my code, but I found coding the algorithm to be interesting and challenging and hope someone would enjoy doing it themselves.

From a more formal point of view, it is easy to see that a(n) <= (n+1)^(2*n) * 2^n ; indeed, we have to figure out the endpoint for 2n transitions, which can be any of the n states or nothing (hence the n+1 part of the formula); once the main structure of the DFA is devised, we have to choose which of the n states are accepting states (hence the 2^n part).

We can check that many of these "different" DFAs share the same language, indeed:

   a(1) = 5        <=   8
   a(2) = 104      <=   324
   a(3) = 4170     <=   32768
   a(4) = 222354   <=   6250000

I am not sure if some arbitrary value could be chosen for a(0) or if leaving this term undefined is the way to go.

Best regards,

--
Thomas Baruchel


More information about the SeqFan mailing list