[seqfan] Inconsistencies in OEIS
Sidney Cadot
sidney at jigsaw.nl
Wed Sep 16 18:28:34 CEST 2015
Hello all,
Over the past two weeks or so I have obtained the OEIS data to prepare
for automated searches for connections between sequences, in the hope
of finding hitherto undiscovered relations. I have fetched both the
internal data for each sequence entry, and its associated b-file.
First order of business has been to organize and parse all data. In
the process, I have discovered a number of issues in the OEIS data,
with various levels of seriousness.
Below is a list of 15 issues currently recognized by my data parser.
- The first (parenthesized) number is a problem ID, chosen by me.
- The second number is an occurrence count (out of +/- 262000 entries).
- Following that is a description of the issue.
- Following that are one or two sequence numbers that demonstrate the issue.
(P1) 1844 missing %A directive.
A000017
(P2) 660 missing %O directive.
A237988
(P3) 660 empty %S line.
A237988
(P4) 456 ill-formatted %O directive (must be %O n,m)
A038219
(P5) 330 %S/%T/%U / b-file mismatch
A000001. A001130
(P6) 221 first-index mismatch (%S/%T/%U vs b-file)
A000001, A000199
(P7) 41 %S/%T/%U has more values than b-file
A002235
(P8) 31 non-sequential indices in b-file
A049532, A064215
(P9) 28 %O magnitude claim (second number) incorrect according to
b-file A013596
(P10) 23 Bad characters in directive data (e.g. ligature ff, fl).
A007054, A007283
(P11) 21 Duplicate keyword in %K directive
A003184
(P12) 18 Parse error in b-file
A050399
(P13) 9 Empty keyword (or trailing comma) in %K directive
A032556
(P14) 2 Bad format for %I directive
A005254, A006809
(P15) 1 Undocumented keyword in %K directive ('probation')
A247556
Some of these issues occur mostly or exclusively in recent entries,
and could be expected to be resolved in time: e.g. P2, P3.
Many issues are quite minor and mostly syntactic, e.g. P1, P10, P11,
P13, P14, P15.
The remaining seven issues P4, P5, P6, P7, P8, P9, P12 are perhaps a
bit serious, since they reflect places where the primary sequence data
(i.e., the numbers), and/or the offset metadata (indicating where the
sequence starts) is not 100% clear.
I would be happy to help in resolving these issues but I am not sure
how to approach that -- going through the default workflow is a bit
daunting given the number of individual issues. I would be
particularly interested to reach anyone responsible for data quality
of the database -- but I don't know if such a role even exists, and if
so, who to contact.
Any suggestions would be welcome.
Kind regards,
Sidney Cadot
More information about the SeqFan
mailing list