Digital silliness

Wed Jan 11 18:13:44 CET 2006

"David Wilson" <davidwwilson at comcast.net> wrote:
:For a number n, let f(n) be the set of numbers gotten by splitting n^2 at the 0 
:digits.  For example
:
:29648^2 = 879003904
:
:so f(29648) = { 4, 39, 879 }
:
:Let S be the smallest set of numbers containing 2 and fixed by f.  What is the 
:largest element of S?

Interesting: I wonder whether the set is finite. Clearly it is for
certain smaller bases: letting a(n) be the largest element of S when
dealing with base n (so that the question posed asks for a(10)),
a(2) = a(4) = 2 are obvious; calculating by hand I found a(3) = 5575,
while a(5) quickly went beyond what was practical by hand.

I used the perl program below to investigate a(10). The code is written
to minimise memory use: it uses a bit-vector to record elements seen
in S for 1..65535, splits the remainder by the number of 32-bit words
required to represent them in binary, and for each distinct length
stores the elements seen as a packed string of the sorted numbers.

The elements seen but not yet expanded are saved in an array, and
a new array started each time the current one is exhausted to give
a handy point to output some diagnostics. The diagnostics are the
number of items in the pending list (elements seen but not yet
expanded), and for each wordsize, the number of elements of that
size seen. (The number of elements seen from the 1..65535 range is
not shown.)

I let the program run for 4 hours; it used about 64MB of memory, saw
around 1.5 million elements, and produced this output:

pend 1; seen []         # pend is 2
pend 1; seen []         # pend is 2^2
pend 1; seen []         # pend is 2^4
pend 1; seen []         # pend is 2^8
pend 1; seen [1:1]      # pend is 2^16
pend 1; seen [1:1 2:1]  # pend is 2^32
pend 3; seen [1:3 2:1]
pend 5; seen [1:5 2:2]
pend 8; seen [1:7 2:5]
pend 20; seen [1:13 2:7]
pend 28; seen [1:23 2:12]
pend 44; seen [1:36 2:19 3:1]
pend 63; seen [1:61 2:33 3:3]
pend 111; seen [1:101 2:60 3:5]
pend 182; seen [1:158 2:102 3:8 4:2]
pend 282; seen [1:269 2:162 3:16 4:4]
pend 449; seen [1:436 2:270 3:34 4:6]
pend 728; seen [1:723 2:451 3:58 4:8]
pend 1116; seen [1:1188 2:760 3:98 4:11 6:1]
pend 1742; seen [1:1943 2:1252 3:160 4:23 5:2 6:1]
pend 2730; seen [1:3195 2:2028 3:265 4:43 5:5 6:1]
pend 4166; seen [1:5052 2:3298 3:450 4:62 5:9 6:2]
pend 6310; seen [1:7938 2:5240 3:776 4:110 5:16 6:5 7:1]
pend 9530; seen [1:12451 2:8285 3:1283 4:195 5:38 6:6 7:2]
pend 14522; seen [1:19295 2:13232 3:2041 4:306 5:57 6:7 7:2]
pend 21829; seen [1:29724 2:20919 3:3232 4:464 5:82 6:11 7:2]
pend 32481; seen [1:45592 2:32513 3:5181 4:737 5:122 6:18 7:2 8:1]
pend 48230; seen [1:69220 2:50350 3:8094 4:1166 5:184 6:26 7:3 8:1]
pend 71344; seen [1:104429 2:77602 3:12514 4:1807 5:290 6:43 7:3 8:1]
pend 105131; seen [1:156606 2:118694 3:19213 4:2778 5:424 6:64 7:9 8:2]
pend 154719; seen [1:233265 2:180500 3:29399 4:4237 5:630 6:89 7:12 8:2]
pend 227526; seen [1:346379 2:272320 3:44965 4:6356 5:939 6:130 7:19 8:2]
pend 334648; seen [1:513936 2:408656 3:67775 4:9537 5:1447 6:193 7:27 8:5]
pend 490340; seen [1:759202 2:610588 3:102018 4:14345 5:2167 6:313 7:44 8:9]

At each cycle the pending list grows by very close to 50%. While it is
possible that the growth will slow and eventually reverse as the density
of low numbers increases, I don't think these numbers show convincing
evidence of that happening yet.

Getting some more results for smaller bases may give a better indication
of the chances this will terminate. I plan to try converting the program
to take the base as a parameter.

Hugo
---
#!/usr/bin/perl -w
use strict;
use Math::Pari;
seen(2);
while (my $n = nextpend()) {
    for (split /0+/, $n * $n) {
        seen($_);
    }
}
final();
exit 0;

{
    my($v16, @list, @listsize);
    BEGIN { $v16 = "\0" x (65536/8) }
    sub seen {
        my $n = shift;
        if (length($n) <= 5 && $n < 65536) {
            return if vec($v16, $n, 1);
            vec($v16, $n, 1) = 1;
            pend($n);
            return;
        }
        $n = PARI($n);
        my($words, $value) = words($n);
        if (!$listsize[$words]) {
            $list[$words] = $value;
            $listsize[$words] = 1;
            pend($n);
            return;
        }
        my $s = \$list[$words];
        my($min, $max) = (0, $listsize[$words]);
        my($mid, $mv, $cmp);
        while ($min < $max) {
            $mid = int(($min + $max)/2);
            $mv = substr($$s, $mid * $words, $words);
            $cmp = $mv cmp $ value;
            return if $cmp == 0;
            my $repl = ($cmp < 0) ? \$min : \$max;
            last if $mid == $$repl;
            $$repl = $mid;
        }
        substr($$s, $max * $words, 0, $value);
        ++$listsize[$words];
        pend($n);
    }
    sub dumpstats {
        my $pend = shift;
        printf "pend %s; seen [%s]\n", scalar @$pend, join ' ',
                map { $listsize[$_] ? "$_:$listsize[$_]" : () } 0..$#listsize;
    }

    sub final {
        print "****** finished ******\n";
        dumpstats([]);
        my $words = $#listsize;
        my $value = substr($list[$words], -$words);
        printf "last value: %s:[%s]\n", $words, join ' ',
                map sprintf("%0x8", $_), unpack "L*", $value;
    }
}

{
    my($cur, $next);
    BEGIN { $cur = []; $next = []; }
    sub pend {
        push @$next, $_[0];
    }
    sub nextpend {
        return shift @$cur if @$cur;
        dumpstats($next);
        $cur = $next;
        $next = [];
        return shift @$cur;
    }
}

sub words {
    my $n = shift;
    # making untested assumptions here about Pari's internal format
    my $w = (Math::Pari::longword($n, 0) & 0xffffff) - 2;
    ($w, pack "L*", map Math::Pari::longword($n, $_+2), 0 .. $w - 1);
}
__END__