[seqfan] Re: A new sequence counting the number of distinct languages recognized by DFAs

Mon Jul 22 18:44:09 CEST 2024

On Mon, Jul 22, 2024 at 11:53 AM Thomas Baruchel via SeqFan <
seqfan at list.seqfan.eu> wrote:

> My Fellow SeqFans,
>
> most of us are aware of the recent achievement concerning BB(5), which
> inspired me to look for a (much easier) side project related to Theory of
> Computation.
>
> I decided to compute the initial terms of the following sequence: the
> number of distinct languages over a binary alphabet recognized by a DFA
> with n states. I was pretty sure I was reinventing the wheel and I expected
> to quickly find this sequence in the OEIS already, but it turns out that is
> not the case (unless I made some mistake in my computation).
>
> Short explanation: a DFA with a single state (n=1) can recognize five
> distinct languages:
>
>    * empty language;
>    * language containing only the empty word;
>    * language contaning any word made with 'a's (including the empty word);
>    * language contaning any word made with 'b's (including the empty word);
>    * language contaiing any word made with 'a's and 'b's (including the
> empty word).
>

(Suggestion: use '0's and '1's, or else in your explanation above say
explicitly "a binary alphabet {'a', 'b'}." It took me a moment to realize
why you didn't include "a word made entirely of 'c's," etc.; and why you
weren't considering your third and fourth languages to be isomorphic.)

Re: a(0), I think you need to nail down more precisely what you mean by "a
DFA with n states." If you mean that you can draw it with n little circular
"spots", plus arrows leading off-screen to "accept" and/or "reject"
outcomes, then a(0) is obviously at least 2:
- no spots, just an arrow labeled "accept" [the universal language]
- no spots, just an arrow labeled "reject" [the empty language]
But normally, I think, the best way to count n-state DFAs is to require
*every* state to be "on-screen," and never let the cursor escape the system
— make it a completely closed system — and just label some set of "spots"
themselves as "accepting". So your five automata above would be represented
as:

==the empty language==
State 1: a->1, b->1
==the language containing only the empty word==
State 1: a->2, b->2, accepting
State 2: a->2, b->2
==the language of 'a's only==
State 1: a->1, b->2, accepting
State 2: a->2, b->2
==the language of 'b's only==
State 1: a->2, b->1, accepting
State 2: a->2, b->2
==the universal language==
State 1: a->1, b->1, accepting

In that notation, there are only two 1-state DFAs possible at all. (Three
of your five languages take 2 states to represent.) And then I think you're
correct that there is no physical meaning to a "zero-state DFA", because it
can't physically be worked — suppose we have a DFA with no start node and
no transitions, and I feed it the letter 'a' — what does it do? Physically
impossible question!

Using the traditional way to count DFA states, we certainly have:
  a(1) = 2 [the empty language and the universal language]
We also have:
  a(n) mod 2 = 0
because for any DFA that accepts L, just flip all of its acceptance flags
and voila, you have a DFA that accepts ~L. For example, your tally of "5"
above failed to count
  ==the language containing *everything but* the empty word==
  State 1: a->2, b->2
  State 2: a->2, b->2, accepting
But your tally for a(2) is still missing e.g.
  ==the language of all words ending in 'a'==
  State 1: a->2, b->1
  State 2: a->2, b->1, accepting
  ==the language of all words *not* ending in 'a'==
  State 1: a->2, b->1, accepting
  State 2: a->2, b->1
  ==the language of all words ending in 'b'==
  State 1: a->1, b->2
  State 2: a->1, b->2, accepting
  ==the language of all words *not* ending in 'b'==
  State 1: a->1, b->2, accepting
  State 2: a->1, b->2

I'm going to resist getting nerdsniped any further than this. But I bet
you'll find that this sequence is already *somewhere* in OEIS.

HTH,
Arthur