[seqfan] Inconsistencies in OEIS

Sidney Cadot sidney at jigsaw.nl
Wed Sep 16 18:28:34 CEST 2015


Hello all,

Over the past two weeks or so I have obtained the OEIS data to prepare
for automated searches for connections between sequences, in the hope
of finding hitherto undiscovered relations. I have fetched both the
internal data for each sequence entry, and its associated b-file.

First order of business has been to organize and parse all data. In
the process, I have discovered a number of issues in the OEIS data,
with various levels of seriousness.

Below is a list of 15 issues currently recognized by my data parser.

- The first (parenthesized) number is a problem ID, chosen by me.
- The second number is an occurrence count (out of +/- 262000 entries).
- Following that is a description of the issue.
- Following that are one or two sequence numbers that demonstrate the issue.

(P1)   1844  missing %A directive.
          A000017
(P2)    660  missing %O directive.
          A237988
(P3)     660  empty %S line.
           A237988
(P4)    456  ill-formatted %O directive (must be %O n,m)
          A038219
(P5)    330  %S/%T/%U / b-file mismatch
          A000001. A001130
(P6)    221  first-index mismatch (%S/%T/%U vs b-file)
          A000001, A000199
(P7)     41  %S/%T/%U has more values than b-file
          A002235
(P8)     31  non-sequential indices in b-file
          A049532, A064215
(P9)     28  %O magnitude claim (second number) incorrect according to
b-file    A013596
(P10)     23  Bad characters in directive data (e.g. ligature ff, fl).
           A007054, A007283
(P11)     21  Duplicate keyword in %K directive
           A003184
(P12)     18  Parse error in b-file
           A050399
(P13)      9  Empty keyword (or trailing comma) in %K directive
           A032556
(P14)      2  Bad format for %I directive
           A005254, A006809
(P15)      1  Undocumented keyword in %K directive ('probation')
           A247556

Some of these issues occur mostly or exclusively in recent entries,
and could be expected to be resolved in time: e.g. P2, P3.

Many issues are quite minor and mostly syntactic, e.g. P1, P10, P11,
P13, P14, P15.

The remaining seven issues P4, P5, P6, P7, P8, P9, P12 are perhaps a
bit serious, since they reflect places where the primary sequence data
(i.e., the numbers), and/or the offset metadata (indicating where the
sequence starts) is not 100% clear.

I would be happy to help in resolving these issues but I am not sure
how to approach that -- going through the default workflow is  a bit
daunting given the number of individual issues. I would be
particularly interested to reach anyone responsible for data quality
of the database -- but I don't know if such a role even exists, and if
so, who to contact.

Any suggestions would be welcome.

Kind regards,

  Sidney Cadot



More information about the SeqFan mailing list