Base change notation

Wed Aug 2 22:34:05 CEST 2006

Apologies for the length of this response, but this is a subject I've 
devoted a lot of time and thought to.  Thanks in advance for your patience...

 >=franktaw at netscape.net
 > I really don't like the base change notation that is used for a 
number of OEIS entries.

Well, there's of course no accounting for taste, but please let me 
assure you a considerable amount of time and effort were expended 
arriving at that notation.

It's far from a gratuitous choice, although there may be aspects of 
its full intended usage that may not be immediately apparent (and 
which admittedly haven't been adequately documented).  I will try to 
explain some of this below.

By the way, I corresponded with a number of people regarding this 
notation before adopting it, including Don Knuth, who appreciated its 
motivations and didn't find it objectionable.  (I was particularly 
concerned by its possible collision with his use of brackets to 
denote extraction of power series coefficients).

In addition there was a certain amount of eMail back-and-forth with 
NJAS and others on the seqfan list to make sure it was reasonably 
acceptable before I started using it in OEIS entries.

So at this point it's become a little bit established by common 
usage.  Maybe it just needs more documentation somewhere?

All this is not to say we can't change it if the consensus is reached 
that it's in fact a bad idea.  But I'd at least hope this is done 
with due consideration for its origins.

 > This involves writing:
 > (b) [n] (c),
 > meaning "write n in base b, and reinterpret as base c".

That's a good but rough gloss, useful for quickly describing many 
simple cases, such as 4[n]2, aka A000695, which also happens to 
contain a somewhat lengthy comment (for an OEIS entry) on the 
notation.  I will try to recap some of the genesis here.

1. Strings versus integers.

One of the original motivations for trying to adopt *some* notation 
in the first place was that there seemed to be a subtle but 
significant bit of confusion that was creeping into the OEIS, which 
may be very simply exemplified by A007088, which is described as 
"Numbers written in base 2."  with entries 1, 10, 11, 100, 101, 110, 111...

That description, "Numbers written in base 2.", actually creates an 
unfortunate ambiguity as to what numeric VALUES the sequence 
represents.  Is it one, two, three... or is it one, ten, 
eleven...?  If you interpret the tokens as STRINGS then it's perhaps 
the former, but if you interpret them as DECIMAL INTEGERS it's the latter.

Now since it is called the "On-line Encyclopedia of INTEGER 
Sequences", not the "On-line Encyclopedia of Sequences of 
DECIMAL-INTEGER-LIKE STRINGS", the one-ten-eleven interpretation 
seems far more natural and reasonable.  For one thing, the OEIS 
doesn't accept submissions in, say, hexadecimal--how could binary be 
any different?

Therefore, acceptable submissions always conform to the implied 
constraint that they *must consist of a sequence of tokens whose 
printed representations are indistinguishable from decimal integers*.

Now, taking this just a small step further ("if it looks like a duck, 
then..." etc) it seems reasonable to take the position that OEIS 
entries not only just happen to LOOK like decimal integers, but the 
sequence terms really DO in fact consist of the actual VALUES that 
print that way.

Put another way, an "11" really and always means eleven, no matter 
what rule was used to compute it--even if intermediate terms in that 
computation are strings or other non-numeric objects and only the 
output step "rendered" it into an integer value.  Only when so 
"rendered" can the value be decimalized and presented as an a(n) 
value.  Interpreting some a(n)=11 as the value three is as 
inconsistent with the OEIS as allowing a sequence to have "F" instead 
of "15" for fifteen.

Furthermore, numerous features associated with the OEIS *rely* on 
this interpretation of the tokens as values.  For example generating 
function descriptions, the library of standard transforms, 
superseeker (which should be considered as intrinsic to the OEIS) and so on.

This may all seem like rather dry hairsplitting about something that 
appears obvious, but I was concerned because the OEIS has been 
afflicted with some fairly tortuous and confusing descriptions due to 
this--imagine an innocent user encountering, for example, the 
description of A032532 (submitted back in 1998):

"Integer part of decimal 'base 2 looking' numbers divided by their 
actual base 2 values (denominator of a(n) is n, numerator is n 
written in binary but read in decimal)."

And there are many other examples in the 135 pages of results 
returned just by searching on the phrase "written in"!

Hence some kind of notation appeared called for, and it seemed a very 
concise one would be best, so that adopting it wouldn't seem a 
burden, and so that its use would tend to encourage consistently 
searchable descriptions.

One might argue that there might be better choices than 2[n]10 for 
A007088, but it does at least fill the bill.

2. Bases, subscripts and ASCII

In trying to come up with a concise, ASCII-friendly, notation, one 
issue I struggled mightily with was the schoolbook convention of 
denoting the base of a numeral-string via a subscript.  Of course the 
most common ASCII convention for writing a subscript is brackets [.], 
so that, for example, one might write

   101[2]  =  5[10]

However I couldn't figure out how to make this jibe with that key 
values-not-strings interpretation, nor work well for notating 
sequences involving rebasing.  For example, describing A007088 as 
something like

   a(n)  =  (n[2])[10]

drags us back into the morass of expressions on strings versus 
numerical values, and is a confusing clutter with a very high 
punctuation to content ratio.

3. Indexing, Polynomials and Algebra

Fortunately the much more common usage of subscripts to indicate 
indexing suggested a happy alternative.  It is typical to write, say, 
S[n] to denote the n-th member of some sequence of objects S.

Indeed, when we write, say, 691, it is a compression of the expansion

   6 * 10^2 + 9 * 10^1 + 1 * 10^0.

Moreover, consider "the set of polynomials with non-negative 
coefficients less than ten"; call it Z10.  Then, in the natural 
ordering, the 691st element of this sequence would be written Z10[691].

And of course this generalizes readily to "other values of 10", such 
as Z2, "the set of polynomials with non-negative coefficients less 
than two", and so on.  Z2[691] is a ninth-degree polynomial, while 
Z10[691] is quadratic.

Now the convention for denoting function application is to write the 
actual parameter immediately to the right.  It is optionally 
parenthesized if needed for clarity, but there are many subjects (eg 
lambda calculus) where simple concatenation is the default.

So to map our 691st polynomial Z10[691] back into an integer, we 
merely apply it to the value 10 (aka "evaluated at x=10"--but that 
way of putting it introduces the needless new entity "x") which can 
be written out in full as

   Z10[691](10)  =  691

or more concisely as

   Z10[691]10  =  691

or, dropping the "Z" (which I'll try to justify further below)

   10[691]10  =  691

which can be generalized to the identity

   b[n]b  =  n

Suggesting, for example, more algebraic ideas, such as b[n]q being 
(approximately) the inverse of q[n]b.

The key general idea is that the brackets suggest indexing.  And then 
we simplify the "base" arguments to expose other relationships as 
much as possible.

But writing 2[n]10 is merely a slightly abbreviated alternative to 
the more obscure and verbose Z2[n](10).

And, for example, (1/p)[.]p has a natural symmetry with p[.](1/p), so 
why needlessly clutter up one side more than the other?

Note that b[n]q represents a mapping from numbers to numbers, without 
the need to invoke "string" concepts.  For integers it can be 
recursively defined by

   b[0]q  =  0

   b[n>0]q  =  (n % b) + q * b[ (n / b) ]q

(where % means remainder and / means quotient).

Moreover b[n]q specifies a perfectly reasonable function for real n, 
or for complex b, or symbolic q, or whatever.

4. Object-oriented ("numbral") arithmetic

More radically, the indexing interpretation doesn't constrain b to be 
a numerical "base" at all--it can be a more general "basis" whose 
elements follow some different algebra, such as GF2.  This allows us 
to encode its elements as GF2[n]2 so as to be able to refer to them 
in OEIS sequences, and to write identities such as

   (GF2[n]^2)2  =  Z2[n]4  (*)

where the types of the subexpressions are clear as is their 
applicative evaluation (a la lambda calculus)...

...but I'll spare you that digression, and try to address the other points:

 > The main problem is that it is not distinctive enough; it looks 
like it means b*n*c.

Perhaps only if you're in the habit of thinking of []s as being 
equivalent to ()s.  While in some contexts that's done, in 
typographically starved ASCII it's not all that usual--often []s are 
interpreted very differently than ()s.

 > Also, the "n" in the center can get a bit lost.

Yes, I can sympathize with that.  But it's just as often the case in 
the OEIS that the "n" is just a running index anyway.  If the "b" and 
"q" are just numbers, then it stands out by being a letter.  In 
fancier formulas any or all of the three might be complicated.

 > Instead, I propose writing:
 > n_{b->c}.
 > This avoids both problems I described above; and it is clear 
enough that I think someone might guess what it means.
 > (The "->" of course represents an arrow, which is what would be 
used if writing by hand or if typesetting.)

This is certainly an alternative, which I'd consider if all I ever 
wanted to do is convert between numerical bases.  But I'd be sad to 
lose all those other interesting allusions.  Writing

   n_{GF2->2}

seems more cluttered and confusing to me, and I can't see how to use 
it to write the above identity (*) which expresses that "A000695 
encodes the squares of GF2".

But, as there's no accounting for taste, I will happily defer to 
whatever NJAS and the OEIS community prefers.