[seqfan] Re: A metric and UI for Pandora-like OEIS exploration
Andrew Weimholt
andrew at weimholt.com
Mon Oct 26 12:03:22 CET 2009
Hi Robert,
Just wanted to comment that there may be better metrics than merely
looking for substring matches.
Two sequences might not share any terms at all, but by some metrics be
considered very closely related
(For example if we look at the shape of their graphs). Ideally, you'd
want to come up with many different
characteristics of the sequences, and assign each one of them a
dimension, and then each sequence would
be placed at some point in this n-dimensional space. Characteristics
may include such things as how quickly the
sequence grows, how chaotic it is, the frequency of primes in the
sequence, the shape of its graph, the categorization of
its generating function, etc.
Andrew
On Mon, Oct 26, 2009 at 3:03 AM, Robert Munafo <mrob27 at gmail.com> wrote:
> I noticed that the scores were slightly biased by the A-numbers:
> A000111 is more likely to match A000110 simply because of the
> resemblance in the A-numbers. So I eliminated that by removing the
> A-number from the beginning of each line in the database during the
> initial format-conversion step.
>
> I also found a significant bug that made it overestimate the coverage
> of A over B, which skewed all the scores.
>
> Then I re-did each of the tests in my previous message; all but one
> result was affected. Here's what is does now:
>
> Tribonacci numbers: A000213 <-> A001648 (score 0.270) (Tetranacci
> numbers without leading 4)
>
> Prime numbers: A000040 <-> A019590 (score 0.225) (Fermat's Last Theorem)
>
> Powers of 2: A000079 <-> A005408 (score 0.284) (The odd numbers)
>
> Pennies sequence: A005577 <-> A005576 (score 0.447) (A different
> pennies sequence)
>
> Composite numbers: A002808 <-> A018252 (score 0.357) (The nonprime numbers)
>
> The "How four dogs meet in a field" sequence gave the same match, just
> with a different score:
> A006451 <-> A006454 (score 0.324)
>
> Kaprekar triples: A006887 <-> A060768 (score 0.268) (Pseudo-Kaprekar triples)
>
> Greedy Egyptian fractions: A100140 <-> A117116 (score 0.181)
> (Denominators of an Egyptian Fraction for phi = (1+sqrt(5))/2)
>
> I am confident it is working, at least for sequences that have useful
> stuff in the %C and %H fields. I am also quite happy with the
> responsiveness on the 16-thread Nehalem system. The roughly 15 second
> wait on the Core 2 Duo machine would get rather tiring after a while.
>
> I'll start rating multiple sequences tomorrow. When it lets you do
> each of the basic functions I described in my first message I'll
> publish source code.
>
> --
> Robert Munafo -- mrob.com
>
>
> _______________________________________________
>
> Seqfan Mailing list - http://list.seqfan.eu/
>
>
More information about the SeqFan
mailing list