[seqfan] Re: project: looking for connections between sequences
mlb at well.com
Wed Aug 11 20:25:56 CEST 2010
>="N. J. A. Sloane" <njas at research.att.com>
> Several people have suggested that it would be interesting
> to run the whole 200,000 sequences through Superseeker,
> to see if there are interesting connections to be discovered.
This is great, a longstanding dream! Some ideas and suggestions:
* Add the core neighborhood first
Superseeker essentially defines the arcs of a graph connecting sequences.
Usually any given sequence has several dozen immediate "superseeker
neighbors", since there are several dozen transforms. This project, at
whatever scale, takes some set of sequences, and finds all the sequences
already in the OEIS at "superseeker distance" one.
Before doing any of these automatic searches I recommend first generating
ALL the neighbors of, at least, the core sequences (currently only 169) and
ADD any missing ones to the OEIS.
These new submissions, being automatically generated, can be trusted to be
properly formatted and accurate--imposing zero editing burden.
Moreover, they could include automatically generated comments describing the
transform involved, and links back to their parent sequence.
Since the core sequences are *especially* interesting, their 1-neighbors are
presumably interesting, so they ought to be in the OEIS anyway. Later, the
full core neighborhood will be present to support future discoveries
involving core 2-neighbors.
More ambitiously there's the "nice neighborhood" around the current kernel
population of 6355.
* Treat offsets and "extra" initial elements especially carefully
Many of the interesting transforms (eg Mobius) are sensitive to the offset
of the sequence and/or choices for "optional" initial terms (eg a(0)), and
messing these up can actually prevent future discoveries. I don't have any
magic bullet to suggest, except vigilance QA-ing submissions, and perhaps
some checking or normalizing pre-processing of inputs to these transforms.
* Free superseeker soon!
> By the way, at present Superseeker only lives on my machine
> at work, where it is shared with my colleagues. It queues up
> submissions, so only one job runs at a time.
> The plan is to move it to the oeis.org site, once the OEIS wiki is finished.
It is most unfortunate for superseeker to be a bottleneck. Anything we can
do to alleviate this would be great.
Obviously allowing other people to host this functionality using their own
resources should be supported.
> I am also planning to make the source code for Superseeker public.
> But that will take a while.
Is this because of the time and effort involved in readying it and putting
it up, or because of IP issues?
In any case the transforms superseeker uses are public, so enthusiastic
seqfans are free offload some of the burden now, by initiating an "open
source seeker" project.
This could either be hosted at oeisf.org or one of the existing code forums.
Another option, which could dodge some IP and support issues, might be to
simply make a binary available.
* Create a distributed superseeker
This could then be run "grass roots" by seqfans for big projects, like
SETI at home etc. Or, it seems plausible that Google or Amazon or someone like
that might even be willing to donate cloud resources to run it on.
* Ensure superseeker is run on all submissions.
If nothing else this would prevent duplicates and trivial variations. But
it also might immediately provoke the submitter into new discoveries or
insights while the topic is "hot" for them. This also could lead to
submissions with more interesting comments and other info (especially if
superseeker had an OEIS-format friendly output option).
If this is too expensive to do now at the time of submission (an impediment
that's surely temporary in the face of Moore's Law) then there ought to be a
background crawler that continuously runs over the entries and generates
alerts when it discovers hits.
Further each OEIS entry could have an associated timestamp when (some)
superseeker processes it, and/or superseekers could generate certificates
Of course this kind of supplemental information doesn't need to be stored in
the main OEISF database, it can be provided as an independent service (by
anyone) that's keyed with A-numbers...
Any seqfans interested in taking any of this on?
More information about the SeqFan