[seqfan] Re: Guided browsing of the OEIS based upon personal preferences?

Sun Oct 25 17:56:56 CET 2009

There is an important difference between music and sequences: one is 
interested in music for itself, but one is  interested in sequences for 
what they are about.  This is an overstatement, of course, but it does 
suggest one important point: one is browsing the entirety of the 
entries in the database, not just the numeric sequences.

Properties like multiplicative and fractal are interesting for a 
browser because they imply things about the the kind of sequence.  
Injective and monotone not because almost any kind of sequence might 
have these properties.

One should not overlook the value of the manually generated 
cross-reference lines for a browser of this sort.  It would be nice to 
distinguish different kinds of cross-references:

* Similar sequences.
* Sequences used in defining this one.
* "Administrative" references, such as the row lengths for "tabf" 
sequences.
* Sequences which use this sequence in their definitions.

Of course, all too many sequences have few or no cross-references.

Another potentially useful indicator of similarity is references to the 
same articles and web sites.  Again, such references are unfortunately 
sparse in the database.  There are some weirdnesses here, too; one 
sequence may reference an article in Wikipedia while another references 
the corresponding article in the World of Mathematics; it is not easy 
to automatically determine when such references are equivalent.

Franklin T. Adams-Watters

-----Original Message-----
From: Antti Karttunen <antti.karttunen at gmail.com>

An interesting link about Pandora, thanks!

Now, for any such browsing-algorithm (whether "personalized" or not)
to work in OEIS, it would need much more software-digestable information
about each entry than is currently available.
Currently, most of the searchable information is "extra-mathematical"
("contingent"), e.g. the submitter's name, the number of references
to academic papers, keywords concerning the quality or importance, such 
as
"nice", "dumb", "less" or "core", etc. These can already be
used in searches, e.g. giving "ref:" into a search field
gives any sequence with at least one reference.

However, to get closer to the idea behind Pandora, that such
human-judgements should not matter there should be many
more keywords, or tags, or "categories" (called that in the new 
Wiki-OEIS)
made easily available to the OEIS-software.
Many of them could be automatically collected by bots, e.g.
given enough terms of the sequence, it's easy to conjecture
that it is, for example:

a) multiplicative (keyword "mult")
b) additive
c) monotone
d) injective
e) surjective (on N, not so easy for a program to detect in many cases)
f) consists of nonnegative values only (keyword "nonn")
g) consists of certain subset of integers only, say {0, 1} or primes,
h) is "continuous", i.e. the first differences consist of {-1, 0, 1} 
only.
i) grows with a certain rate, e.g. linear, quadratic, exponential, etc.
j) etc. etc.

Of course, all these are categories on which no program can make a
conclusive
decision based only on the finite subset of terms from an infinite 
sequence.
So, this "classification bot" would tag the entries only tentatively
with these classes, and then it would be a task of the editors to either
affirm these, or mark the entry with the corresponding "opposite tag"
(e.g. "non-nonn" is the opposite keyword of "nonn", currently called
"sign"),
to tell the bot that no, this sequence is not in category X, although
it might look like that. These cases are probably quite rare, as this
implies that the first counter-example is so far away that it is not
practical to include so many terms in the database/b-file.
Another possibility is that the property remains conjectural,
and in that case it should be flagged as so. (E.g. "conj-prime"
if the sequence seems to get only prime values, but nobody has proved
it for sure.)
(Should we have a different tag/keyword-prefix for the conjectures
made by the bot-program and the conjectures acknowledged by
human beings?)

Now, one could use time-stamping to lighten the task
of the bot (i.e. don't run the analyzer for already analyzed entries),
but care must be taken to rerun it on those entries where
terms have been added or corrected.
In any case, such a semi-automated bot would be a major improvement
over the current situation, where such analyzing have been done
only sporadically by some brave individuals.

Unfortunately, most of these easily detectable categories are
not very interesting or distinguishing.
- "Ah, this sequence was monotone! Now please give me more monotone
sequences from this same author!"

So, there are still two paths to proceed.

A) A lots of manual categorization, like in Pandora's case. Make it
as a task of associate editors not to just edit and correct, but also
to find as many meaningful categories (nowadays still called "Index
entries")
for the new entries as one can find. Prepare a some kind of
"Category FAQ" which lists the most useful categories to watch out,
in which a sequence could fall. Certainly, one cannot rely that the
many of the submitters would themselves know or care about them.

B) However, what I think people really want is analogues:

 - Is there a similar sequence, but based on partitions instead of
   combinations?

 - Some other automorphism applied to this same combinatorial structure?

 - Similar recurrence, but with slightly different parameters.
   (BTW: I see from the index entries that some people have already done
    a lots of work regarding this idea.)

 - What if instead of the primes we used here the irreducible elements
   of some other factorization domain? Or the leftovers from
   some other kind of sieve process (e.g. Lucky or Ludic numbers)?

 - A factorial-base analogue for this decimal-based sequence?

 - Base-3 or Zeckendorf-expansion analogue for this base-2 sequence?

 - What about using some other 2-ary function that just ordinary
   multiplication in this convolution formula?

 - etc. etc.

However, this would require that the OEIS-software would have a 
transparent
access to the formula, gen.func. & other information on %F, %C and 
%Y-lines,
which is currently obfuscated in the dozens of different Ad Hoc
-notations and the stray pieces of code in various, often proprietary
programming languages. So, I guess this is a long-term project, although
I don't have a doubt that it would not eventually be implemented as 
well.
Here one could start by integrating all the "combstruct"-information
from the Encyclopedia of Combinatorial Structures (if not already
in OEIS), and then making the search program so sophisticated
that it could operate with those structures based on the input
given by the user. At least the combinatorialists would like it.

Just my two cents,

Yours,

Antti Karttunen

On Mon, Oct 19, 2009 at 11:06 PM, <seqfan-request at list.seqfan.eu> wrote:

>
>
> Message: 9
> Date: Mon, 19 Oct 2009 11:51:05 -0400
> From: Rick Shepherd <rlshepherd2 at gmail.com>
> Subject: [seqfan] Guided browsing of the OEIS based upon personal
> preferences?
> To: seqfan at list.seqfan.eu
> Message-ID:
>        <b949fe1a0910190851h3aefa599yc0318422d5401b72 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hello, SeqFans,
>
> Bear with me, I'm not really being off-topic (although this may be
> interesting to some people strictly for music-related reasons).
> This note will soon come back to the OEIS.  I'll also emphasize now 
that
> the
> OEIS already has a Browse feature:
> 
http://www.research.att.com/~njas/sequences/Sbrowse.html<http://www.research.att.com/%7Enjas/sequences/Sbrowse.html> 

(although I'm
> having difficulty accessing the database at the moment)
>
> Today I ran across an article describing a system called Pandora for
> suggesting songs one may like based upon one's previous statements of
> preferred songs.  Actually, my son had already told me about Pandora 
at
> least twice -- but this is the first time I've seen a bit about the
> nuts-and-bolts of how these suggestions are made.  Most of us have 
probably
> experienced some apparently-similar system in some sphere where a 
product
> is
> suggested because "other buyers of this product also bought these 
other
> products" (e.g., Amazon and books/etc) or where the software (using
> cookies,
> etc.) is clearly attempting to learn what we like (i.e., this isn't 
new).
> Pandora, in contrast to some of these others, attempts to be more 
objective
> and makes suggestions based upon straightforward characterizations of
> technical elements of songs rather than "collaborative filtering"
> (popularity or what everyone else is doing) -- but, of course, 
matters of
> (other people's) taste and subjectivity cannot be completely 
eliminated
> (yet).
>
> If one replaced "Pandora's music collection" with "the OEIS" and 
"song"
> with
> "sequence" (and drew several similar parallels), this article could 
give
> some food for thought about future directions for the OEIS.  This 
article
> also touches upon what to add to the collection and how to decide 
that --
> topics that have been recently discussed on this list for the OEIS.
>
> If the OEIS currently "Contains 164537 sequences", *one day* it might
> actually be so large that it's difficult to find that which you're 
really
> seeking.  :^J)   Algorithms based upon sophisticated, dynamic saved
> searches (and more) could help direct those who aren't looking to 
"commit
> complete serendipity" on a given day (The latter I admit is often the 
mode
> I
> enjoy but certainly not always.).
>
> Here's the link, "The Song Decoders":
> 
http://www.nytimes.com/2009/10/18/magazine/18Pandora-t.html?pagewanted=1&em
> (published Oct. 14th, 2009)
> I've found (in the USA) that sometimes it's necessary to be logged-in 
to
> one's NY Times account (free registration) to access their articles 
-- and
> sometimes not -- even for the same articles (it seems partly to be 
based
> upon time of day).
>
> Regards,
> Rick