[seqfan] Re: Inconsistencies in OEIS

Neil Sloane njasloane at gmail.com
Thu Sep 17 01:05:56 CEST 2015


Concerning P14:

(P14)      2  Bad format for %I directive
           A005254, A006809

There is nothing wrong here. Please modify your rules to allow these
formats.

Best regards
Neil

Neil J. A. Sloane, President, OEIS Foundation.
11 South Adelaide Avenue, Highland Park, NJ 08904, USA.
Also Visiting Scientist, Math. Dept., Rutgers University, Piscataway, NJ.
Phone: 732 828 6098; home page: http://NeilSloane.com
Email: njasloane at gmail.com


On Wed, Sep 16, 2015 at 7:03 PM, Neil Sloane <njasloane at gmail.com> wrote:

> Just to deal with the last point:
>
> >  I would be particularly interested to reach anyone responsible for data
> quality of the database -- but I don't know if such a role even exists, and
> if so, whom to contact.
>
> That would be me!
>
> I will look at the other points in the next few days
>
> Best regards
> Neil
>
> Best regards
> Neil
>
> Neil J. A. Sloane, President, OEIS Foundation.
> 11 South Adelaide Avenue, Highland Park, NJ 08904, USA.
> Also Visiting Scientist, Math. Dept., Rutgers University, Piscataway, NJ.
> Phone: 732 828 6098; home page: http://NeilSloane.com
> Email: njasloane at gmail.com
>
>
> On Wed, Sep 16, 2015 at 4:31 PM, Charles Greathouse <
> charles.greathouse at case.edu> wrote:
>
>> On P1, sequences with any of the keywords (dead, recycled, allocated,
>> allocating) should not have an author, so A000017 is not in error. Your
>> example for P2 and P3 is an allocated sequence, so it should have only %I,
>> %S, %N, and %K lines (which it does).
>>
>> Sequences for which all terms are in {-1, 0, 1} should have only the first
>> offset number. Sequences for which the first term with absolute value
>> greater than 1 is not displayed only have a second offset if it is
>> manually
>> entered (but some have not been). In particular the %O line of A038219 is
>> correct.
>>
>> (P5) and (P6) look like serious errors -- b-files and terms should always
>> agree. (Would someone look into A1 especially?)
>>
>> (P7) is technically not an error, but those b-files should probably be
>> removed, at least once it's verified that the terms match.
>>
>> (P8) are errors, often because files were cut off. These should be flagged
>> and corrected as time allows.
>>
>> (P9) sounds like an error worth fixing (though the second part of the
>> offset is, in my opinion, the least important part of the sequence).
>>
>> Would you be more specific about (P10)? Which field has bad characters?
>>
>> (P11) and (P13) are worth fixing -- maybe we should canonicalize the
>> keywords, it could even speed up searches if coded right. I fixed your
>> examples.
>>
>> On (P12), I don't see a problem with the b-file for A050399. See
>> https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Format
>> for the gory details on (what I consider) the correct format of a b-file.
>> Of course it would be nicer if it was formatted according to the strict
>> rather than the loose guidelines, but that's a relatively minor detail.
>> Ideally all b-file writing software would output the strict format and all
>> b-file reading software would accept the loose format.
>>
>> Neil, what do you think of (P14)?
>>
>> For (P15) the keyword is documented here
>> https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Keywords
>> and is quite rare, as you noticed. This isn't a bug. You might also see
>> the
>> super-rare keyword allocating from time to time.
>>
>> >  I would be particularly interested to reach anyone responsible for data
>> quality of the database -- but I don't know if such a role even exists,
>> and
>> if so, who to contact.
>>
>> Coincidentally I was just considering the possibility of a QA process for
>> the OEIS recently. Nothing exists at the moment though.
>>
>> Charles Greathouse
>> Analyst/Programmer
>> Case Western Reserve University
>>
>> On Wed, Sep 16, 2015 at 12:28 PM, Sidney Cadot <sidney at jigsaw.nl> wrote:
>>
>> > Hello all,
>> >
>> > Over the past two weeks or so I have obtained the OEIS data to prepare
>> > for automated searches for connections between sequences, in the hope
>> > of finding hitherto undiscovered relations. I have fetched both the
>> > internal data for each sequence entry, and its associated b-file.
>> >
>> > First order of business has been to organize and parse all data. In
>> > the process, I have discovered a number of issues in the OEIS data,
>> > with various levels of seriousness.
>> >
>> > Below is a list of 15 issues currently recognized by my data parser.
>> >
>> > - The first (parenthesized) number is a problem ID, chosen by me.
>> > - The second number is an occurrence count (out of +/- 262000 entries).
>> > - Following that is a description of the issue.
>> > - Following that are one or two sequence numbers that demonstrate the
>> > issue.
>> >
>> > (P1)   1844  missing %A directive.
>> >           A000017
>> > (P2)    660  missing %O directive.
>> >           A237988
>> > (P3)     660  empty %S line.
>> >            A237988
>> > (P4)    456  ill-formatted %O directive (must be %O n,m)
>> >           A038219
>> > (P5)    330  %S/%T/%U / b-file mismatch
>> >           A000001. A001130
>> > (P6)    221  first-index mismatch (%S/%T/%U vs b-file)
>> >           A000001, A000199
>> > (P7)     41  %S/%T/%U has more values than b-file
>> >           A002235
>> > (P8)     31  non-sequential indices in b-file
>> >           A049532, A064215
>> > (P9)     28  %O magnitude claim (second number) incorrect according to
>> > b-file    A013596
>> > (P10)     23  Bad characters in directive data (e.g. ligature ff, fl).
>> >            A007054, A007283
>> > (P11)     21  Duplicate keyword in %K directive
>> >            A003184
>> > (P12)     18  Parse error in b-file
>> >            A050399
>> > (P13)      9  Empty keyword (or trailing comma) in %K directive
>> >            A032556
>> > (P14)      2  Bad format for %I directive
>> >            A005254, A006809
>> > (P15)      1  Undocumented keyword in %K directive ('probation')
>> >            A247556
>> >
>> > Some of these issues occur mostly or exclusively in recent entries,
>> > and could be expected to be resolved in time: e.g. P2, P3.
>> >
>> > Many issues are quite minor and mostly syntactic, e.g. P1, P10, P11,
>> > P13, P14, P15.
>> >
>> > The remaining seven issues P4, P5, P6, P7, P8, P9, P12 are perhaps a
>> > bit serious, since they reflect places where the primary sequence data
>> > (i.e., the numbers), and/or the offset metadata (indicating where the
>> > sequence starts) is not 100% clear.
>> >
>> > I would be happy to help in resolving these issues but I am not sure
>> > how to approach that -- going through the default workflow is  a bit
>> > daunting given the number of individual issues. I would be
>> > particularly interested to reach anyone responsible for data quality
>> > of the database -- but I don't know if such a role even exists, and if
>> > so, who to contact.
>> >
>> > Any suggestions would be welcome.
>> >
>> > Kind regards,
>> >
>> >   Sidney Cadot
>> >
>> > _______________________________________________
>> >
>> > Seqfan Mailing list - http://list.seqfan.eu/
>> >
>>
>> _______________________________________________
>>
>> Seqfan Mailing list - http://list.seqfan.eu/
>>
>
>


More information about the SeqFan mailing list