[seqfan] Re: A new sequence counting the number of distinct languages recognized by DFAs
Thomas Baruchel
baruchel at gmx.com
Mon Jul 22 23:19:24 CEST 2024
On Mon, 22 Jul 2024, brad klee wrote:
>> I would appreciate it if someone could independently check my
>> values (maybe some existing library related to DFAs could be used).
>
> Arthur makes a good point--It would be easier to help if you
> could explain more the details of your schema and algorithms.
Hi Brad,
I will try to tell more about it. Obviously two different variants
of the idea can be implemented, but the most important part of the
idea is: enumerating DFA's of a given size and counting distinct
languages (DFAs recognizing the same language have to be counted
once even if their graph isn't exactly the same). Thus, the most
interesting and challenging part of the coding is recognizing
equivalent DFAs.
Indeed, two definitions can be chosen:
- if DFAs have to be complete, then every transition for each
symbol is defined from each state and a(1)=2;
- if the transition functions are allowed to be partial, then
some transitions are allowed to remain undefined (in which case
a word can be rejected when encountering such a symbol, and a(1)=5.
Currently, my code does the following things:
- enumerating each DFA of a given size (both definitions above are
easy to implement);
- checking if the current DFA is equivalent to a previous DFA (if
it is the case, it is added to a list; otherwise it is discarded);
for testing the equivalence, a "parallel BFS search" is performed on
two DFAs until detecting a prefix leading one of the DFA into an
accepting state while the other DFA reaches a non-accepting state;
if the BFS search explores the whole graph with no conflict, both
languages are known to be the same;
- storing a "canonical representant" of DFA for each ditsinct
language.
Each DFA is encoded as an integer:
- last n bits of the integer are a mask describing which states are
accepting states;
- left part of the integer is a n-radix (or n+1 according to the exact
definition of the DFAs) integer of length 2n, where digits at
position 2k give the next state from the state 'k' when reading
the symbol 0, while each digit at position 2k+1 gives the next
state from the state 'k' when reading the symbol 1.
I use n-radix for the complete DFAs (each digit between 0 and n-1 is
used for encoding the next state) and (n+1)-radix for uncomplete DFAs
(the digit n being used for an undefined transition).
For instance, take a complete DFA of size n=3; then DFA with id 5445 is
the following:
5445 = ( 2*3^5 + 2*3^4 + 1*3^3 + 0 *3^2 + 1*3 + 2 ) * 8 + 5
The three last bits (encoded by the value 5) means that state q0 and q2
are accepting state (because 5 = 101_(2)).
The left part of the integer helps building the following table:
(q0, 0) -> q2
(q0, 1) -> q1
(q1, 0) -> q0
(q1, 1) -> q1
(q2, 0) -> q2
(q2, 1) -> q2
> It sounds like what you've done is to allow free-floating edges
> that are assumed as pointing to rejections? Or perhaps instead
> of free-floating edges, you've omitted them entirely?
Yes indeed, it is the system I use. However, I could easily hack my
code for the other system; leading to the two following sequences:
- complete DFAs : 2, 26, 1054, 57068, …
- uncomplete DFAs : 5, 104, 4170, 222354, …
And it seems that neither of these sequences currently are in the OEIS.
I can indeed provide some pictures; for instance here is the case
a(1) = 5
- empty language:
https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%7D
- language containing only the empty word:
https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%7D
- language containing words made of only 0s:
https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D0%20%5D%0A%7D
- language containing words made of only 1s:
https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D1%20%5D%0A%7D
- universal language:
https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D
If we choose the "complete DFAs" version, the two distinct automatas are:
https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D
and
https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D
Here are now two "different" 3-states DFAs sharing the same language:
(first version) https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20q1%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20q2%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%22%20%5D%0A%20%20q0%20-%3E%20q1%20%5B%20label%3D%221%22%20%5D%0A%20%20q1%20-%3E%20q2%20%5B%20style%3Dinvis%20%5D%0A%20%20q1%20-%3E%20q1%20%5B%20label%3D%220%22%20%5D%0A%20%20q1%20-%3E%20q0%20%5B%20label%3D%221%22%20%5D%0A%20%20q2%20-%3E%20q2%20%5B%20label%3D%220%2C1%22%20%5D%0A%7D
(second version) https://dreampuf.github.io/GraphvizOnline/#digraph%20G%20%7B%0A%20%20layout%3Ddot%0A%20%20rankdir%3DLR%0A%20%20start%20%5B%20style%3Dinvis%20%5D%0A%20%20q0%20%5Bshape%20%3D%20circle%20%5D%0A%20%20q1%20%5Bshape%20%3D%20doublecircle%20%5D%0A%20%20q2%20%5Bshape%20%3D%20circle%20%5D%0A%20%20start%20-%3E%20q0%0A%20%20%0A%20%20q0%20-%3E%20q0%20%5B%20label%3D%220%22%20%5D%0A%20%20q0%20-%3E%20q1%20%5B%20label%3D%221%22%20%5D%0A%20%20q1%20-%3E%20q2%20%5B%20style%3Dinvis%20%5D%0A%20%20q1%20-%3E%20q1%20%5B%20label%3D%220%22%20%5D%0A%20%20q1%20-%3E%20q2%20%5B%20label%3D%221%22%20%5D%0A%20%20q2%20-%3E%20q2%20%5B%20label%3D%220%22%20%5D%0A%20%20%20%20q2%20-%3E%20q1%20%5B%20label%3D%221%22%20%5D%0A%7D
Here is my C code for the complete DFAs version: https://pastebin.com/BGvm9Br6 (hacked from the following one)
Here is my C code for the uncomplete DFAs version: https://pastebin.com/vhmnp0eE
I hope it may help understanding my idea. Again, I suggest focusing
initially on the "equivalent DFAs" part rather on the choosing such or
such definition of the DFAs, since the most interesting part of the
compuation is there IMHO.
Best regards,
--
Thomas Baruchel
More information about the SeqFan
mailing list