[seqfan] Re: A metric and UI for Pandora-like OEIS exploration

Mon Oct 26 17:01:08 CET 2009

Andrew --

It isn't just matching terms in the sequence, it is matching the
entire database record. Based on the results I've been getting so far,
it appears that the %H (references and citations) fields are causing
most of the top matches.

Look at the examples I posted, and you'll see that. For example,
A100140 and A117116 don't have many terms in common, but both
reference the "Egyptian Fraction" article in MathWorld.

Notice that the %H fields (and most of the other fields for that
matter) are created by humans and concern relevant information about a
sequence. And two sequences might not be related by any phenomena that
you or I might think of (such as the examples you cite: "chaoticness",
speed of growth, incidence of prime terms, etc.) *but* if it does,
someone is bound to have written about it.

It doesn't always work too well, for example A000079 and A005408. But
that's why I'm asking you people to think about it.

Andrew Weimholt wrote:
>
> Hi Robert,
>
> Just wanted to comment that there may be better metrics than merely looking for substring matches.
> Two sequences might not share any terms at all, but by some metrics be considered very closely related
> (For example if we look at the shape of their graphs). Ideally, you'd want to come up with many different
> characteristics of the sequences, and assign each one of them a dimension, and then each sequence would
> be placed at some point in this n-dimensional space. Characteristics may include such things as how quickly the
> sequence grows, how chaotic it is, the frequency of primes in the sequence, the shape of its graph, the categorization of
> its generating function, etc.

--
 Robert Munafo  --  mrob.com