[seqfan] Re: A new sequence counting the number of distinct languages recognized by DFAs

Thomas Baruchel baruchel at gmx.com
Mon Jul 22 23:19:24 CEST 2024


On Mon, 22 Jul 2024, brad klee wrote:
>> I would appreciate it if someone could independently check my
>> values (maybe some existing library related to DFAs could be used).
>
> Arthur makes a good point--It would be easier to help if you
> could explain more the details of your schema and algorithms.

Hi Brad,

I will try to tell more about it. Obviously two different variants
of the idea can be implemented, but the most important part of the
idea is: enumerating DFA's of a given size and counting distinct
languages (DFAs recognizing the same language have to be counted
once even if their graph isn't exactly the same). Thus, the most
interesting and challenging part of the coding is recognizing
equivalent DFAs.

Indeed, two definitions can be chosen:

  - if DFAs have to be complete, then every transition for each
    symbol is defined from each state and a(1)=2;
  - if the transition functions are allowed to be partial, then
    some transitions are allowed to remain undefined (in which case
    a word can be rejected when encountering such a symbol, and a(1)=5.

Currently, my code does the following things:

  - enumerating each DFA of a given size (both definitions above are
    easy to implement);
  - checking if the current DFA is equivalent to a previous DFA (if
    it is the case, it is added to a list; otherwise it is discarded);
    for testing the equivalence, a "parallel BFS search" is performed on
    two DFAs until detecting a prefix leading one of the DFA into an
    accepting state while the other DFA reaches a non-accepting state;
    if the BFS search explores the whole graph with no conflict, both
    languages are known to be the same;
  - storing a "canonical representant" of DFA for each ditsinct
    language.

Each DFA is encoded as an integer:

  - last n bits of the integer are a mask describing which states are
    accepting states;
  - left part of the integer is a n-radix (or n+1 according to the exact
    definition of the DFAs) integer of length 2n, where digits at
    position 2k give the next state from the state 'k' when reading
    the symbol 0, while each digit at position 2k+1 gives the next
    state from the state 'k' when reading the symbol 1.

I use n-radix for the complete DFAs (each digit between 0 and n-1 is
used for encoding the next state) and (n+1)-radix for uncomplete DFAs
(the digit n being used for an undefined transition).

For instance, take a complete DFA of size n=3; then DFA with id 5445 is
the following:

  5445 = ( 2*3^5 + 2*3^4 + 1*3^3 + 0 *3^2 + 1*3 + 2 ) * 8 + 5

The three last bits (encoded by the value 5) means that state q0 and q2
are accepting state (because 5 = 101_(2)).

The left part of the integer helps building the following table:

  (q0, 0) -> q2
  (q0, 1) -> q1
  (q1, 0) -> q0
  (q1, 1) -> q1
  (q2, 0) -> q2
  (q2, 1) -> q2

> It sounds like what you've done is to allow free-floating edges
> that are assumed as pointing to rejections? Or perhaps instead
> of free-floating edges, you've omitted them entirely?

Yes indeed, it is the system I use. However, I could easily hack my
code for the other system; leading to the two following sequences:

  - complete DFAs : 2, 26, 1054, 57068, …
  - uncomplete DFAs : 5, 104, 4170, 222354, …

And it seems that neither of these sequences currently are in the OEIS.

I can indeed provide some pictures; for instance here is the case
  a(1) = 5

  - empty language:
    https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%7D

  - language containing only the empty word:
    https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%7D

  - language containing words made of only 0s:
    https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D0%20%5D%0A%7D

  - language containing words made of only 1s:
    https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D1%20%5D%0A%7D

  - universal language:
    https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D


If we choose the "complete DFAs" version, the two distinct automatas are:

  https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D

and

    https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D



Here are now two "different" 3-states DFAs sharing the same language:

  (first version)  https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20q1%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20q2%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%22%20%5D%0A%20%20q0%20-%3E%20q1%20%5B%20label%3D%221%22%20%5D%0A%20%20q1%20-%3E%20q2%20%5B%20style%3Dinvis%20%5D%0A%20%20q1%20-%3E%20q1%20%5B%20label%3D%220%22%20%5D%0A%20%20q1%20-%3E%20q0%20%5B%20label%3D%221%22%20%5D%0A%20%20q2%20-%3E%20q2%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D

  (second version) https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20q1%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20q2%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%22%20%5D%0A%20%20q0%20-%3E%20q1%20%5B%20label%3D%221%22%20%5D%0A%20%20q1%20-%3E%20q2%20%5B%20style%3Dinvis%20%5D%0A%20%20q1%20-%3E%20q1%20%5B%20label%3D%220%22%20%5D%0A%20%20q1%20-%3E%20q2%20%5B%20label%3D%221%22%20%5D%0A%20%20q2%20-%3E%20q2%20%5B%20label%3D%220%22%20%5D%0A%20%20%20%20q2%20-%3E%20q1%20%5B%20label%3D%221%22%20%5D%0A%7D


Here is my C code for the complete DFAs version: https://pastebin.com/BGvm9Br6  (hacked from the following one)
Here is my C code for the uncomplete DFAs version: https://pastebin.com/vhmnp0eE

I hope it may help understanding my idea. Again, I suggest focusing
initially on the "equivalent DFAs" part rather on the choosing such or
such definition of the DFAs, since the most interesting part of the
compuation is there IMHO.

Best regards,

-- 
Thomas Baruchel


More information about the SeqFan mailing list