[seqfan] Re: D(n)

Sat Dec 21 22:45:40 CET 2019

I've found that the title of the 'Bag of Digits' discussion has prompted some productive thoughts about this one. They have given me two further values, namely a(10) = 111367788889 and a(11) = 11112222333445666777778899, and told me that a(12) has at least 152 digits. Here's how...

The key point is that D() is really a function that applies to bags (or more technically, multisets) of digits. For example, when applying D() repeatedly to integers, we have the following results:

  2566 --> 2512 --> 451
  2656 --> 2125 --> 415
  2665 --> 2125 --> 415
  5266 --> 5212 --> 541
  5626 --> 5122 --> 514
  5662 --> 5122 --> 514
  6256 --> 1225 --> 145
  6265 --> 1225 --> 145
  6526 --> 1252 --> 145
  6562 --> 1252 --> 145
  6625 --> 1225 --> 145
  6652 --> 1252 --> 145

But these are all basically just digit-order variations on applying the equivalent function to the multiset of digits:

  {2,5,6,6} --> {1,2,2,5} --> {1,4,5}

As applied to bags of digits, the function splits its argument into ten multisets of digits, each containing all the copies of one of the digits (some of the multisets may well be empty), transforms each of those multisets into a multiset containing all the digits of the sum of the contents (transforming empty multisets to empty multisets), and then takes the multiset union of the results. E.g. {6,6,2,5} splits into {6,6}, {2}, {5} and seven copies of {}, which transforms to {1,2}, {2}, {5} and seven copies of {}, whose multiset union is {1,2,2,5}.

There are two interesting points about looking at it that way from the point of view of investigating the function. The first is that since the size of the multiset union is just the sum of the sizes of the multisets that go into it, we only need to look at how multisets of identical digits are transformed to gather information about how the size of D(m) relates to the size of a multiset m. That information can be summarised in the following table:

Digit  Number of extra copies in input required to produce:
       1st output digit  2nd output digit  3rd output digit ...
---------------------------------------------------------------
  0           1                 -                 -         ...
  1           1                 9                90         ...
  2           1                 4                45         ...
  3           1                 3                30         ...
  4           1                 2                22         ...
  5           1                 1                18         ...
  6           1                 1                15         ...
  7           1                 1                13         ...
  8           1                 1                11         ...
  9           1                 1                10         ...

This table is intended to be used as a table of costs of 'buying' output digits with the indicated digit (with '-' indicating that 'buying' more than one output digit with 0s is impossible). For example, just one copy of 7 produces an output digit: {7} --> {7}. And once we've got that one, just one more 7 produces a second output digit: {7,7} --> {1,4}. But once we got those two, thirteen more 7s are needed to produce a third output digit: {7*14} --> {8,9} isn't enough, but {7*15} --> (0,1,5} is (using the notation digit*copies in multisets to avoid the very cumbersome {7,7,7,7,7,7,7,7,7,7,7,7,7,7} and {7,7,7,7,7,7,7,7,7,7,7,7,7,7,7} in this case and much worse in what follows...).

We can therefore 'buy' the first fifteen output digits for one input digit each - i.e. it's possible for D() to transform a 15-digit multiset into another 15-digit multiset: this happens specifically for {0,1,2,3,4,5*2,6*2,7*2,8*2,9*2} --> {0*2,1*6,2*2,3,4*2,6,8}. But if I want D() to produce more than 15 output digits, I've got to 'pay' at least two input digits for each one of the excess over 15 - i.e. D() reduces the size of all >15-digit multisets. And the 'cost' of each additional output digit rises quite steeply:

* To get a 16th output digit, we've got to have at least 17 input digits, with 17 being achievable precisely for {0,1,2,3,4*3,5*2,6*2,7*2,8*2,9*2} --> {0*2,1*7,2*3,3,4,6,8}.

* To get a 17th output digit, we've got to have at least 20 input digits, with 20 being achievable precisely for {0,1,2,3*4,4*3,5*2,6*2,7*2,8*2,9*2} --> {0*2,1*8,2*4,4,6,8}.

* To get an 18th output digit, we've got to have at least 24 input digits, with 24 being achievable precisely for {0,1,2*5,3*4,4*3,5*2,6*2,7*2,8*2,9*2} --> {0*3,1*9,2*3,4,6,8}.

* To get a 19th output digit, we've got to have at least 33 input digits, with 33 being achievable precisely for {0,1*10,2*5,3*4,4*3,5*2,6*2,7*2,8*2,9*2} --> {0*4,1*9,2*3,4,6,8}.

* To get a 20th output digit, we've got to have at least 43 input digits, with 43 being achievable precisely for {0,1*10,2*5,3*4,4*3,5*2,6*2,7*2,8*2,9*12} --> {0*5,1*9,2*3,4,6,8}.

* To get a 21st output digit, we've got to have at least 54 input digits, with 54 being achievable precisely for {0,1*10,2*5,3*4,4*3,5*2,6*2,7*2,8*13,9*12} --> {0*6,1*9,2*3,4*2,8}.

* To get a 22nd output digit, we've got to have at least 67 input digits, with 67 being achievable precisely for {0,1*10,2*5,3*4,4*3,5*2,6*2,7*15,8*13,9*12} --> {0*7,1*9,2*3,4,5,8}.

* Etc, etc, etc...

So if we can determine the number n of iterations of D() required to reduce every <=15-digit multiset to a stable multiset (i.e. one with no repeated digits), i.e. the maximum n such that a(n) has <= 15 digits, then we know that:

* every multiset that requires n+1 iterations to reduce it to a stable multiset has at least 16 digits, i.e. a(n+1) >= 10^15;

* every multiset that requires n+2 iterations to reduce it to a stable multiset has at least 17 digits, i.e. a(n+2) >= 10^16;

* every multiset that requires n+3 iterations to reduce it to a stable multiset has at least 20 digits, i.e. a(n+3) >= 10^19;

* every multiset that requires n+4 iterations to reduce it to a stable multiset has at least 43 digits, i.e. a(n+4) >= 10^42;

* every multiset that requires n+5 iterations to reduce it to a stable multiset has at least 43+11+13+15+18+22+30+45+90+100+112+128+150+180+225+300+450+900+1000+1125+1286+1500+1800+2250 = 11793 digits (using the cost table above extended with two more columns), i.e. a(n+5) >= 10^11792;

* and so on - but the minimum number of digits the next step of the argument shows that the number of digits a(n+6) must have would make this contribution rather enormous (and don't even think about a(n+7)...). So I'll stop trying to give further minimum values at this point!

So basically, if we can establish the values of D(m) for all multisets m containing <= 15 digits and use that to determine the largest n such that a(n) < 10^15, we know that at most 4 more values of a(i) beyond that will have reasonable length, and at most one beyond that even halfway-unreasonable length.

The second way in which looking at D() as a function on multisets of digits rather than on integers is interesting becomes relevant at this point. It is that establishing the values of D(m) for all multisets m containing <= 15 digits is a much smaller job than establishing the values of D(n) for all integers n <= 10^15. Specifically, there are (9 CHOOSE 9) + (10 CHOOSE 9) + (11 CHOOSE 9) + (12 CHOOSE 9) + (13 CHOOSE 9) + ... + (24 CHOOSE 9) = (25 CHOOSE 10) = 3268760 such multisets m (including the empty multiset, which doesn't actually arise from any integer's digits but can be regarded as a harmless duplicate of {0}), compared with 1000000000000000 such integers n. This is based on the fact that there are (c+9) CHOOSE 9 multisets containing c digits, which can be shown by means of a 1-1 correspondence between such multisets and bitstrings of length c+9 that contain precisely nine 1s. That 1-1 correspondence is that such a multiset corresponds to the bitstring containing as many 0s as there are 0s in the multiset, followed by a 1, followed by as many 0s as there are 1s in the multiset, followed by a 1, followed by as many 0s as there are 2s in the multiset, followed by a 1, etc, until we end with as many 0s as there are 9s in the multiset. For instance, the multiset {0,1,2,2,4,5,5,5,6,7,9} corresponds to the bitstring 01010011010001010110.

This changes the memory required to store all such values from a smallish number of petabytes to a smallish number of megabytes, making it much more feasible in practice, and reduces the time required to calculate them all by a similar factor. And we can generalise the encoding described above of multisets containing c digits in c+9 bits, nine of which are 1s, by prefixing it with an extra 1 preceded by as many 0s as desired (those 0s basically representing non-existent digits, to get an encoding of the multisets containing c digits as binary integers i in the range 2^(c+9) <= i < 2^(c+10) whose binary encodings contain ten 1s. These ranges fit together cleanly to produce an encoding of multisets containing <= C bits as (C+10)-bit binary integers containing ten 1s - e.g. with C=15:

  00000 00000 00000 11111 11111 encodes {}

  00000 00000 00001 01111 11111 encodes {0}
  00000 00000 00001 10111 11111 encodes {1}
  00000 00000 00001 11011 11111 encodes {2}
  ...
  00000 00000 00001 11111 11110 encodes {9}

  00000 00000 00010 01111 11111 encodes {0,0}
  00000 00000 00010 10111 11111 encodes {0,1}
  00000 00000 00010 11011 11111 encodes {0,2}
  ...
  00000 00000 00010 11111 11110 encodes {0,9}
  00000 00000 00011 00111 11111 encodes {1,1}
  00000 00000 00011 01011 11111 encodes {1,2}
  00000 00000 00011 01101 11111 encodes {1,3}
  ...
  00000 00000 00011 01111 11110 encodes {1,9}
  00000 00000 00011 10011 11111 encodes {2,2}
  ...
  ...
  00000 00000 00011 11111 11100 encodes {9,9}

  00000 00000 00100 01111 11111 encodes {0*3}
  ...
  ...
  ...
  11111 11111 00000 00000 00000 encodes {9*15}

The order in which the multisets appear when sorted (as here) in increasing order of the integer value of this encoding is lexicographic, first by the number of digits in the multiset in increasing order, then by the number of 0s in the multiset in decreasing order, then by the number of 1s in the multiset in decreasing order, then by the number of 2s in the encoding in decreasing order, etc, through to the number of 9s in the multiset in decreasing order (though the last criterion that is actually needed is the number of 8s in the multiset in decreasing order, since the number of digits and the numbers of 0s, 1s, 2s, 3s, 4s, 5s, 6s, 7s and 8s jointly imply the number of 9s).

A useful consequence of this exact definition of the encoding and sort order is that if D(m) is not equal to m (i.e. if at least one digit appears twice or more in m), D(m) always appears earlier in the encoded list than m, since:

A) applying D() reduces the number of digits in any multiset that contains two or more of any of the digits 0-4 or three or more of any of the digits 5-9, so that D(m) appears before m in the encoded list on the number-of-digits criterion;

B) applying D() leaves the number of digits unchanged in any multiset to which A) does not apply but that contains exactly two of at least one of the digits 5-9, and makes at least one of the changes {5,5} --> {0,1}, {6,6} --> {1,2}, {7,7} --> {1,4}, {8,8} --> {1,6} and {9,9} --> {1,8} to the digits in D(m) compared with those in m. All of those changes replace two digits >= 5 with two digits that always include a 1 and sometimes a 0, so that D(m) appears before m in the encoded list on a tie on the number-of-digits criterion plus in some cases having more 0s plus in all cases having more 1s.

As a result, if we work upwards through the integer values of the encodings of m as (C+10)-bit numbers which have ten 1s, recording how many iterations of D() each needs to become stable, we will always either find that D(m) = m, implying that m needs 0 iterations to become stable, or we will already have calculated how many iterations D(m) needs to become stable, implying that m requires one more iteration than that number. So we can calculate the number of iterations each multiset requires to become stable in a simple single pass through those encodings. And the final ingredient is that there is a coding trick to work through all N-bit numbers that have M bits set in them that doesn't require one to look at and reject all the ones that have other numbers of bits set in them. For brevity, I'll refer those interested in the trick to Knuth, "The Art of Computer Programming, volume 4A - Combinatorial Algorithms Part 1" for it - it's described in section 7.2.1.3 "Generating all combinations" subsection "Lexicographic generation" and in exercise 7.1.3-20 "Gosper's hack".

Based on the above, I've written a C program to work through all multisets of up to 54 digits in that order, 54 being chosen because that makes the encodings be 64-bit integers with ten set bits. There are 64 CHOOSE 10 = 23930713170 such encodings - a large but manageable number of integers to work through. It's a rather less manageable number of iteration counts to store, but we only need to store iteration counts for values of m that might be produced as D(m') for a later m'. We know that 54 input digits to D() can produce at most 21 output digits, which can all be encoded as 31-bit integers with ten set bits. So we only need to store 31 CHOOSE 10 = 44352165 iteration counts, an easily manageable number - and I've chosen to store them pretty sparsely in an array of 2^31 iteration counts to simplify and speed up the process of looking up the iteration count for an encoding.

The results of the program start:

First multiset needing 1 iterations to stabilise is {0,0}.
First multiset needing 2 iterations to stabilise is {0,5,5}.
First multiset needing 3 iterations to stabilise is {1,6,6}.
First multiset needing 4 iterations to stabilise is {6,8,8}.
First multiset needing 5 iterations to stabilise is {4,4,6,8}.
First multiset needing 6 iterations to stabilise is {2,2,4,6,8}.
First multiset needing 7 iterations to stabilise is {1,1,2,4,6,8}.
First multiset needing 8 iterations to stabilise is {1,2,4,6,9,9}.
First multiset needing 9 iterations to stabilise is {1,6,7,8,9,9,9}.
First multiset needing 10 iterations to stabilise is {1,1,1,3,6,7,7,8,8,8,8,9}.
First multiset needing 11 iterations to stabilise is {1,1,1,1,2,2,2,2,3,3,3,4,4,5,6,6,6,7,7,7,7,7,8,8,9,9}.

The first couple of those results do show that some care is needed when the first multiset needing a certain number of iterations to stabilise contains one or more zeros, because there may not be any integer with the digits in the multiset (as is the case for {0,0} and would be for any bigger all-0s multiset) or there may be one but it's not the smallest integer which requires that number of iterations: 055 --> 010 --> 01 would be the smallest integer requiring 2 iterations if leading zeros were allowed, but as they're not, the smallest integer with digits {0,5,5} is 505 --> 100 --> 10, and that's beaten by 112 --> 22 --> 4.

So the program is not necessarily a good guide to the smallest integer requiring n iterations when the multiset it produces contains 0s. However, the remaining results do not contain 0s, and so they say that:

a(3) is 166 --> 112 --> 22 --> 4

a(4) is 688 --> 616 --> 112 --> 22 --> 4

a(5) is 4468 --> 868 --> 166 --> 112 --> 22 --> 4

a(6) is 22468 --> 4468 --> 868 --> 166 --> 112 --> 22 --> 4

a(7) is 112468 --> 22468 --> 4468 --> 868 --> 166 --> 112 --> 22 --> 4

a(8) is 124699 --> 124618 --> 22468 --> 4468 --> 868 --> 166 --> 112 --> 22 --> 4

a(9) is 1678999 --> 167827 --> 161482 --> 26482 --> 4648 --> 868 --> 166 --> 112 --> 22 --> 4

confirming results already reported in this thread, and they add that:

a(10) is 111367788889 --> 33614329 --> 961429 --> 186142 --> 28642 --> 4864 --> 886 --> 166 --> 112 --> 22 --> 4

a(11) is 11112222333445666777778899 --> 4898518351618 --> 432910336 --> 4929106 --> 4182106 --> 428206 --> 44806 --> 8806 --> 1606 --> 1120 --> 220 --> 40

Since any integer that requires 11 iterations to reach an all-digits-different stable integer therefore has at least 26 digits, and the arguments above about the number of digits input to D() say that at least 1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+2+3+4+9+10+11+13+15+18+22+30 = 152 digits are needed to produce 26 output digits, one can conclude that a(12) has at least 152 digits. This goes well beyond the range of my program's search, so I stopped it before it completed - no point in giving it some days to weeks to work its way up to 54 digits when it's clear that no further results will be found!

David

> On 09 December 2019 at 19:24 David Sycamore via SeqFan <seqfan at list.seqfan.eu> wrote:
> 
> 
> Thanks Giovanni, that’s definitely an improvement. You did get the definition right (see drafts A330273 and A330284, where the definition of D(n) has now been improved by M.F. Hasler). 886 was my mistake, obviously 688 is better (no descending digits). However, for me 688 is a(4) not a(5), because the offset is 0. So we now have a(0) to a(9) inclusive, which is 2 up on yesterday. You have also shown that we can have numbers in which a digit occurs more than twice (in your a(9), 9 occurs three times). 
> Did you use a code to find these terms (so we are sure they satisfy the minimal condition?).  If not, is there anyone out there who would like to write a code for this sequence? It would be interesting to see what further terms there are, and in particular, to know if it’s finite or not. 
> Best
> David 
> 
> >> On 9 Dec 2019, at 13:32, Giovanni Resta <giovanni.resta at iit.cnr.it> wrote:
> >> 
> >> On 12/8/19 11:07 PM, David Sycamore via SeqFan wrote:
> >> 
> >> Combining the above with your results, the updated terms are now (up to a(7)):
> >> 1, 11, 112, 166, 886, 4468, 22468, 112468...
> > 
> > Uhm. If I got the definition right, a(1)-a(10) =
> > 
> > 
> > 1, 11, 112, 166, 688, 4468, 22468, 112468, 124699, 1678999, ...
> > 
> > In particular,
> > I got a(5)=688 (since 688 -> 616 -> 121 -> 22 -> 4) instead of a(5)=886.
> > 
> > Giovanni
> > 
> > --
> > Seqfan Mailing list - http://list.seqfan.eu/
> 
> 
> --
> Seqfan Mailing list - http://list.seqfan.eu/