[seqfan] Re: The contamination sequence

David Seal david.j.seal at gwynmop.com
Mon Apr 6 14:53:59 CEST 2020


Hi Éric,

> yes, I guess you pointed the problem 
> in the right way: for me 11 and 9876543210
> are not exactly the same type of number (here).
> 11 is immediately forbidden.
> 9876543210 is not.
> A greedy algorithm will discard 11
> because 11 embeds a contradiction;
> the same algorithm will accept
> 9876543210 and the seq will stop.

I think you mean 987654321 (which forbids every no-leading-zeros continuation) rather than 9876543210 (which allows every no-leading-zeros continuation)!

> This is not exactly the same (for me).
> Backtracking is not needed with 11.
> You say it is with 9876543210.
> But the aims are different in this case.
> Without backtracking the seq stops
> at some point (I guess -- this is 
> precisely the conjecture).
> With a "simple" backtracking it might 
> grow forever.
> But are we sure that a simple backtracking 
> would then be enough?
> Will we not need a not-so-simple
> backtracking to go on when the
> simple backtracking extends the
> seq for, say, 10000 terms then leads
> the seq into a dead end?

Your definition says "The contamination sequence: a digit d must be separated at least by d non-d digits from any other digit d; this is the lexicographically earliest sequence of distinct integers with this property." It doesn't place any restriction on the algorithm used to calculate terms of the sequence. If the algorithm to determine terms of the sequence needs to backtrack for 10000 terms, then it has to do that or give up with an indication that it's failed, because the definition doesn't restrict the amount of backtracking needed to generate the sequence. With a "lexicographically earliest sequence with property P" definition like that, there are three possibilities: no infinite sequences with property P exist, or infinite sequences with property P do exist and one of them is lexicographically earlier than all the others, or infinite sequences with property P do exist but no matter which one of them you pick, there is a lexicographically earlier such sequence. Those three possibilities say that the sequence is finite, infinite and non-existent respectively.

In the case of your definition, the sequence 10, 1010, 101010, ... shows that infinite sequences with the "digits d separated by at least d non-d digits" property do exist, which rules out the first of those possibilities. So the sequence defined by your definition is either infinite or non-existent.

If you want to restrict yourself to the greedy algorithm, you need a definition like "The greedy contamination sequence: the sequence generated from the empty sequence by successively appending the least not-previously-used positive integer to it that leaves the sequence still having the property that digits d are separated by at least d non-d digits, for d each of 0 to 9". That basically specifies the algorithm used to generate the sequence in the definition (which I will call the 'blindly greedy algorithm' in what follows): if that algorithm runs into a dead end, the sequence is finite; otherwise it's infinite. But it's a different sequence definition from the one you gave, and it defines a different sequence. In particular, it defines a finite sequence and the definition you gave defines an infinite sequence. See later in this post for a demonstration of the second part of that; here's a proof by contradiction of the first part, i.e. that as you suspected, the blindly greedy algorithm does eventually run into a dead end, terminating the sequence.

To start that proof, assume that the blindly greedy algorithm doesn't ever run into a dead end. Then it never leaves the sequence it has generated so far with its last nine digits being 987654321 (regardless of how those digits are split among terms).

Consider the following set S of nine positive integers:

  {1000000000987654321,
   2000000000987654321,
   3000000000987654321,
   4000000000987654321,
   5000000000987654321,
   6000000000987654321,
   7000000000987654321,
   8000000000987654321,
   9000000000987654321}

I claim that at every stage of the blindly greedy algorithm (on the assumption that it never runs into a dead end), at least one of those integers is an allowed next term of the sequence. For that not to be the case, they would all have to be disallowed. None of them can be disallowed for having appeared before, since that would have been a previously-encountered dead end, so they would all have to be disallowed for violating the "digits d separated by d non-d digits" property. But 1000000000987654321 can only violate that property if the last digit of the sequence so far is 1, and 2000000000987654321 can only violate it if the last two digits of the sequence so far include a 2, so they can only both violate it if the last two digits of the sequence so far are 21 in that order. And 3000000000987654321 can only violate it if the last three digits of the sequence so far include a 3, so they can only all three violate it if the last three digits of the sequence so far are 321 in that order. And so on, until we conclude that it's only possible for all nine elements of S to violate the property if the last nine digits of the sequence so far are 987654321 in that order - but that would mean that the blindly greedy algorithm has just arrived at a dead end, contrary to assumption.

So if the blindly greedy algorithm never arrives at a dead end, then at least one element of S is an allowed next term at every one of its stages, and so at least one integer <= 9000000000987654321 is an allowed next term at every one of its stages. But since the blindly greedy algorithm simply chooses the smallest allowed next term at every stage, that means that it selects an integer <= 9000000000987654321 at every stage. But clearly it can do that for at most 9000000000987654321 stages, so it has to arrive at a dead end when it tries to generate its 9000000000987654322nd term at the very latest - and that's the required contradiction to the assumption that it never arrives at a dead end. (One can tighten that bound on its length quite a bit by observing that it can never select any 'forever banned' integers like 11, by the way, but I won't bother because I'm only trying to show here that its length is finite, not to establish exactly what that finite length is.)

> I would appreciate a deep investigation 
> about the fate of the sequence _without_ any backtracking at all
> (simple or smart). If, like the "Automatic Ant",
> the seq turns into a process tout
> describe (extension with 10, 1010,
> 101010, etc.) I would jump of joy!

It won't become that sequence from any point onwards, essentially because there are infinite sequences with the property that grow more slowly. For example, 1, 20, 120, 1201, 20120, 120120, 1201201, 20120120, 120120120, ... grows by one digit per term rather than two. So if the sequence became 10, 1010, 101010, 10101010, ... from 1010 onwards, for example, I could look to change:

  earlier terms, 1010, 101010, 10101010, 1010101010, 101010101010, ...

into:

  earlier terms, 1010, 20120, 120120, 1201201, 20120120, ...

That will produce a lexicographically earlier sequence with the property, unless one of 20120, 120120, 1201201, 20120120, ... appears in the earlier terms. There are only finitely many earlier terms, so there is a largest earlier term that appears in 20120, 120120, 1201201, 20120120, ... If that largest earlier term is 20120 or 120120, instead replace it by:

  earlier terms, 1010, 101010, 1201201, 20120120, 120120120, ...

to get a lexicographically earlier sequence with the property. Or if that largest earlier term is 1201201 or 20120120, instead replace it by:

  earlier terms, 1010, 101010, 10101010, 120120120, 1201201201, ...

and so on. So whatever that largest earlier term is, I can get a lexicographically earlier infinite sequence with the property than the sequence that becomes 1010, 101010, 10101010, ... at some point. And similar arguments say I can do the same for one that becomes 101010, 10101010, 1010101010, ... at some point, and for one that becomes 10101010, 1010101010, 10101010101010, ... at some point, etc. In short, any infinite sequence with the property that it 'Automatic Ants' its way into the 10, 1010, 101010, 10101010, ... sequence from some point onwards can be transformed into a lexicographically earlier infinite sequence with the property, and so is not the lexicographically earliest sequence with the property.

Basically, my sequence 10, 1010, 101010, 10101010, ... was *only* intended to demonstrate that infinite sequences with the property exist, not as a candidate for an "Automatic Ant" track that your sequence might eventually settle into. And by the way, neither is my sequence 1, 20, 120, 1201201, 20120120, ... such a candidate, because slower-growing infinite sequences exist - an example is the following, which is to be read down the first column, then down the second, then down the third, etc:

     1,      12101,      121012101,     1210121012101,
     2,      21012,      210121012,     2101210121012,
    10,     101210,     1012101210,    10121012101210,
    12,     121012,     1210121012,          ...
   101,    1012101,    10121012101,
   210,    2101210,    21012101210,
  1210,   12101210,   121012101210,

That grows by 4 digits every 7 terms, so quite a lot more slowly than a digit per term. And I'm pretty sure that a yet more slowly-growing infinite sequence can be designed, and one that grows even more slowly than that one, and so on...

I said above that I would demonstrate that the definition you gave defines an infinite sequence, which I will now do by describing an algorithm that generates it without backtracking (i.e. once the algorithm has produced a term, it won't subsequently run into a dead end and be forced to retract it). It's essentially a 'not-quite-blindly greedy' algorithm that looks just one digit ahead, by generating the smallest integer that doesn't violate the "digits d separated by at least d non-d digits" property *and* allows the digit that follows it to be nonzero. I roughly described that algorithm before:

> > In fact, I think that the sequence can be generated in a non-backtracking
> > way, provided we look one digit ahead of the digits that will be appended
> > by the candidate integer we're considering appending to it. Specifically,
> > rather than just checking that the current candidate N doesn't breach the
> > "digits d separated by at least d non-d digits" property, it should check
> > each of nine possibilities for the property: N followed by a 1 at the start
> > of the following term, N followed by a 2 at the start of the the following
> > term, etc, up to N followed by a 9 at the start of the following term. If
> > all of those possibilities breach the property (as in the case N =
> > 987654321), reject N and try the next candidate term to be appended; if at
> > least one of those possibilities doesn't breach the property, accept N
> > because we know that the sequences with the property can be continued
> > infinitely after appending N. (Proof: if appending N followed by a nonzero
> > digit d doesn't breach the property, neither does appending N followed by
> > further terms that produce the digits sequence d0d0d0d0..., and we can
> > always produce such further terms by spacing the commas sufficiently far
> > apart. The result of doing so won't be the lexicographically smallest
> > infinite continuation of the sequence with N appended, but it will be a
> > legitimate infinite continuation with N appended and so establishes that
> > backtracking from a decision to append N won't be needed.)

A more precise formulation of that algorithm:

Top-level algorithm
-------------------

Set the sequence so far to the empty sequence, then repeatedly generate the next term as described below and append it to the sequence so far.

Next-term-generation algorithm
------------------------------

This will work with a queue of integers N that have property (A) below and at least one of properties (B1), (B2) and (B3):

(A) The sequence so far with N appended has the "digits d separated by at least d non-d digits" property.

(B1) N has just one digit.

(B2) N with its last digit removed (i.e. FLOOR(N/10)) already appears in the sequence.

(B3) N's last digit is 0, i.e. N is divisible by 10.

In the following, the instruction "test/add N" means to first check whether N has property (A). If it doesn't, move on to the next thing to do. If it does, then check whether it has any of properties (B1), (B2) and (B3), and if it has at least one of them, add it to the end of the queue and move on to the next thing to do. Otherwise (i.e. if N has property (A) and none of (B1), (B2) and (B3)), output FLOOR(N/10) as the next term of the sequence, terminating this next-term-generation algorithm.

Initialise by starting with an empty queue and then test/adding 1, 2, 3, 4, 5, 6, 7, 8, 9 in that order. Note that this always produces a non-empty queue, because:

* The first time we perform this next-term-generation algorithm, the sequence so far is empty, so all of 1 to 9 have properties (A) and (B1), and they are all placed in the queue.

* On subsequent occasions that we perform this next-term-generation algorithm, we know that the last time ended by appending n = FLOOR(N/10) to the sequence for some N that was not divisible by 10. We know that:

  (previous sequence terms), N

had the "digits d separated by at least d non-d digits" property when that was done, and that it resulted in the sequence we see now being (previous sequence terms), n, where n = FLOOR(N/10). We also know that N was not divisible by 10 at that point, and so N = 10n + r, with r being one of the digits 1 to 9. So the sequence so far with r appended is:

  (previous sequence terms), n, r

which has exactly the same digits in the same order as (previous sequence terms), N and so also has the "digits d separated by at least d non-d digits" property. So r has properties (A) and (B1) and has therefore been placed in the queue.

So at the end of this initialisation, the queue is always non-empty, and since all the integers we have test/added have property (B1), this next-term-generation algorithm has not yet terminated.

After that initialisation, repeatedly do the following until this next-term-generation algorithm terminates: remove the element at the start of the queue from the queue, calling it n, then test/add 10n, 10n+1, 10n+2, 10n+3, 10n+4, 10n+5, 10n+6, 10n+7, 10n+8, 10n+9 (i.e. n with each of the digits 0 to 9 appended) in that order. Note that it is always possible to do this, because every time we remove an element from the queue, we either output the next term and terminate, or we add one or more integers that are at least 10n to the end of the queue (since appending a 0 digit never creates a violation of the "digits d separated by at least d non-d digits" property). I.e. for as long as this continues, the number of elements in the queue cannot decrease, and so the queue cannot empty.

Note also that it always will terminate. To prove this, it's easy to show inductively that the elements in the queue are always in ascending order and they're all at least f and less than 10f, where f is the first element in the queue. So as we progress, f moves inexorably upwards as we remove elements from the start of the queue, and add elements that are all in the range 10f to 10f+9 to the end of the queue, and we will therefore eventually either terminate or get to the point where the largest term in the sequence so far is less than FLOOR(f/10). At that point, we know that none of the elements of the queue can have either property (B1) or property (B2), so they must all have property (B3). In particular, f must have property (B3) and so end with 0. When we remove f from the queue and test/add 10f, it will have property (A), since f has it and appending a 0 cannot create a violation of property (A), and it will not have property (B1) or (B2) because it's bigger than f, which is already too big to have them, but it will have property (B3) and so will be added to the end of the queue. But then we will test/add 10f+1 and that will also have property (A), since we know that the sequence so far with f appended has property (A) and has last digit 0, and therefore that appending 1 to it cannot create a violation of property (A), and it will have none of properties (B1), (B2) and (B3). So the algorithm will terminate at that point.

To show that this algorithm produces the lexicographically earliest infinite sequence with the "digits d separated by at least d non-d digits" property requires a fair amount of detailed reasoning. An outline (with various bits of detailed reasoning left as exercises for the reader) is as follows:

The presence of an n-digit integer i in the queue asserts that if one constructs the next term of the lexicographically earliest sequence, concatenated with the first digit of the term after it, the algorithm has not yet determined whether the first n digits of the result are that integer. In the general case, that covers the possibilities (a) that the next term has at least n digits, starting with the digits of i, or (b) that the next term has exactly n-1 digits and is FLOOR(i/10), and the first digit of the term after that is i MOD 10. In two special cases, only possibility (a) exists because (b) describes an impossible situation: if n=1, it's impossible for a positive integer to have n-1 digits, and if i is divisible by 10, i MOD 10 = 0 cannot be the first digit of a term.

The presence of that n-digit integer i in the queue also indicates that those digits do not violate the "digits d separated by at least d non-d digits" property with regard to each other (for example, 11 will never appear in the queue) or with regard to the sequence so far, and if neither of the special cases applies, it indicates that FLOOR(i/10) appears in the sequence so far. It does not indicate anything about whether later digits of the sequence to come do or don't violate the "digits d separated by at least d non-d digits" property, nor about whether any integer whose first n digits are i appears in the sequence so far or not.

For example, the presence of:

* 1256 in the queue covers possibilities that the next term is 1256, that it is a 5-digit or longer integer whose first four digits are 1256, and that it is 125 with the first digit of the following term being 6. It indicates that 125 appears in the sequence so far, and that it has already been checked that the last digit of the sequence so far isn't 1, 2, 5 or 6 and neither the last-but-one nor the last-but-two digit of the sequence so far is 5 or 6.

* 1250 in the queue covers possibilities that the next term is the next term is 1250 and that it is a 5-digit or longer integer whose first four digits are 1250. It does not indicate anything about whether 125 is in the sequence so far, but it does indicate that it has already been checked that the last digit of the sequence so far isn't 1, 2, 5 or 6, and that neither the last-but-one nor the last-but-two digit of the sequence so far is 5.

Together, all the integers in the queue indicate the full range of possibilities for the concatenation of the next term of the sequence with the first digit of the term that follows it - i.e. the queue as a whole asserts that cases its integers don't cover aren't possible. As stated above, those integers are in ascending order, and so the element at the start of the queue will have the minimal value of FLOOR(element/10) among the elements of the queue (though it's not necessarily the unique minimum, e.g. 1256 at the start of the queue might be followed by one or more of 1257, 1258 and 1259, and they all have FLOOR(element/10) = 125).

The process of removing the element i at the start of the queue and add/trying 10i, 10i+1, ..., 10i+9 fully resolves one possibility for FLOOR(i/10) to be the next term of the sequence, either concluding that it is and terminating generation of the next term, or concluding that it cannot be the next term in the way covered by i and replacing i with a set of further integers that cover all the other possibilities that were previously covered by i. 

So after doing that successively for all the elements i at the start of the queue that share the same value of FLOOR(i/10), the question of whether the next term of the sequence is FLOOR(i/10) is fully resolved: either it definitely is and generation of the next term has been completed, or all possibilities that it is have been removed from the set of possibilities covered by the queue. The proof above that the next-term-generation algorithm cannot fail and must terminate says that an infinite sequence will be generated without running into a dead end, and the ordering of the queue ensures that each term is the least that it can be, given all previously-generated terms. Finally, the way the terms are generated ensures that the sequence so far doesn't violate the "digits d separated by at least d non-d digits" property at any stage during its generation, and since that property is a purely 'local' one (it can only be violated between two terms separated by at most eight other terms) that implies that the property isn't violated by the entire infinite sequence.

So the sequence is the lexicographically earliest infinite sequence that has the "digits d separated by at least d non-d digits" property.

David



More information about the SeqFan mailing list