[seqfan] Re: Inconsistencies in OEIS

Wed Sep 16 22:31:32 CEST 2015

On P1, sequences with any of the keywords (dead, recycled, allocated,
allocating) should not have an author, so A000017 is not in error. Your
example for P2 and P3 is an allocated sequence, so it should have only %I,
%S, %N, and %K lines (which it does).

Sequences for which all terms are in {-1, 0, 1} should have only the first
offset number. Sequences for which the first term with absolute value
greater than 1 is not displayed only have a second offset if it is manually
entered (but some have not been). In particular the %O line of A038219 is
correct.

(P5) and (P6) look like serious errors -- b-files and terms should always
agree. (Would someone look into A1 especially?)

(P7) is technically not an error, but those b-files should probably be
removed, at least once it's verified that the terms match.

(P8) are errors, often because files were cut off. These should be flagged
and corrected as time allows.

(P9) sounds like an error worth fixing (though the second part of the
offset is, in my opinion, the least important part of the sequence).

Would you be more specific about (P10)? Which field has bad characters?

(P11) and (P13) are worth fixing -- maybe we should canonicalize the
keywords, it could even speed up searches if coded right. I fixed your
examples.

On (P12), I don't see a problem with the b-file for A050399. See
https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Format
for the gory details on (what I consider) the correct format of a b-file.
Of course it would be nicer if it was formatted according to the strict
rather than the loose guidelines, but that's a relatively minor detail.
Ideally all b-file writing software would output the strict format and all
b-file reading software would accept the loose format.

Neil, what do you think of (P14)?

For (P15) the keyword is documented here
https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Keywords
and is quite rare, as you noticed. This isn't a bug. You might also see the
super-rare keyword allocating from time to time.

>  I would be particularly interested to reach anyone responsible for data
quality of the database -- but I don't know if such a role even exists, and
if so, who to contact.

Coincidentally I was just considering the possibility of a QA process for
the OEIS recently. Nothing exists at the moment though.

Charles Greathouse
Analyst/Programmer
Case Western Reserve University

On Wed, Sep 16, 2015 at 12:28 PM, Sidney Cadot <sidney at jigsaw.nl> wrote:

> Hello all,
>
> Over the past two weeks or so I have obtained the OEIS data to prepare
> for automated searches for connections between sequences, in the hope
> of finding hitherto undiscovered relations. I have fetched both the
> internal data for each sequence entry, and its associated b-file.
>
> First order of business has been to organize and parse all data. In
> the process, I have discovered a number of issues in the OEIS data,
> with various levels of seriousness.
>
> Below is a list of 15 issues currently recognized by my data parser.
>
> - The first (parenthesized) number is a problem ID, chosen by me.
> - The second number is an occurrence count (out of +/- 262000 entries).
> - Following that is a description of the issue.
> - Following that are one or two sequence numbers that demonstrate the
> issue.
>
> (P1)   1844  missing %A directive.
>           A000017
> (P2)    660  missing %O directive.
>           A237988
> (P3)     660  empty %S line.
>            A237988
> (P4)    456  ill-formatted %O directive (must be %O n,m)
>           A038219
> (P5)    330  %S/%T/%U / b-file mismatch
>           A000001. A001130
> (P6)    221  first-index mismatch (%S/%T/%U vs b-file)
>           A000001, A000199
> (P7)     41  %S/%T/%U has more values than b-file
>           A002235
> (P8)     31  non-sequential indices in b-file
>           A049532, A064215
> (P9)     28  %O magnitude claim (second number) incorrect according to
> b-file    A013596
> (P10)     23  Bad characters in directive data (e.g. ligature ff, fl).
>            A007054, A007283
> (P11)     21  Duplicate keyword in %K directive
>            A003184
> (P12)     18  Parse error in b-file
>            A050399
> (P13)      9  Empty keyword (or trailing comma) in %K directive
>            A032556
> (P14)      2  Bad format for %I directive
>            A005254, A006809
> (P15)      1  Undocumented keyword in %K directive ('probation')
>            A247556
>
> Some of these issues occur mostly or exclusively in recent entries,
> and could be expected to be resolved in time: e.g. P2, P3.
>
> Many issues are quite minor and mostly syntactic, e.g. P1, P10, P11,
> P13, P14, P15.
>
> The remaining seven issues P4, P5, P6, P7, P8, P9, P12 are perhaps a
> bit serious, since they reflect places where the primary sequence data
> (i.e., the numbers), and/or the offset metadata (indicating where the
> sequence starts) is not 100% clear.
>
> I would be happy to help in resolving these issues but I am not sure
> how to approach that -- going through the default workflow is  a bit
> daunting given the number of individual issues. I would be
> particularly interested to reach anyone responsible for data quality
> of the database -- but I don't know if such a role even exists, and if
> so, who to contact.
>
> Any suggestions would be welcome.
>
> Kind regards,
>
>   Sidney Cadot
>
> _______________________________________________
>
> Seqfan Mailing list - http://list.seqfan.eu/
>