[seqfan] Re: Inconsistencies in OEIS

Neil Sloane njasloane at gmail.com
Thu Sep 17 01:46:01 CEST 2015


Concerning P11:

(P11)     21  Duplicate keyword in %K directive
           A003184

I fixed 21 sequences that had such an error. But you might check that
I caught them all. There was one that had
both tabl and tabf keywords. I don't know if you considered that an error -
it is

Again, thank you for catching these errors

Best regards
Neil

Neil J. A. Sloane, President, OEIS Foundation.
11 South Adelaide Avenue, Highland Park, NJ 08904, USA.
Also Visiting Scientist, Math. Dept., Rutgers University, Piscataway, NJ.
Phone: 732 828 6098; home page: http://NeilSloane.com
Email: njasloane at gmail.com


On Wed, Sep 16, 2015 at 7:16 PM, Neil Sloane <njasloane at gmail.com> wrote:

> Concerning P9:
> there were 9 sequences that either had a doubled comma ",," (8 such)
> or a trailing comma (one such). I could not find any with a null keyword
> string.
> These have now been corrected - thanks for catching these errors.
>
> Best regards
> Neil
>
> Neil J. A. Sloane, President, OEIS Foundation.
> 11 South Adelaide Avenue, Highland Park, NJ 08904, USA.
> Also Visiting Scientist, Math. Dept., Rutgers University, Piscataway, NJ.
> Phone: 732 828 6098; home page: http://NeilSloane.com
> Email: njasloane at gmail.com
>
>
> On Wed, Sep 16, 2015 at 7:05 PM, Neil Sloane <njasloane at gmail.com> wrote:
>
>> Concerning P14:
>>
>> (P14)      2  Bad format for %I directive
>>            A005254, A006809
>>
>> There is nothing wrong here. Please modify your rules to allow these
>> formats.
>>
>> Best regards
>> Neil
>>
>> Neil J. A. Sloane, President, OEIS Foundation.
>> 11 South Adelaide Avenue, Highland Park, NJ 08904, USA.
>> Also Visiting Scientist, Math. Dept., Rutgers University, Piscataway, NJ.
>> Phone: 732 828 6098; home page: http://NeilSloane.com
>> Email: njasloane at gmail.com
>>
>>
>> On Wed, Sep 16, 2015 at 7:03 PM, Neil Sloane <njasloane at gmail.com> wrote:
>>
>>> Just to deal with the last point:
>>>
>>> >  I would be particularly interested to reach anyone responsible for
>>> data
>>> quality of the database -- but I don't know if such a role even exists,
>>> and
>>> if so, whom to contact.
>>>
>>> That would be me!
>>>
>>> I will look at the other points in the next few days
>>>
>>> Best regards
>>> Neil
>>>
>>> Best regards
>>> Neil
>>>
>>> Neil J. A. Sloane, President, OEIS Foundation.
>>> 11 South Adelaide Avenue, Highland Park, NJ 08904, USA.
>>> Also Visiting Scientist, Math. Dept., Rutgers University, Piscataway, NJ.
>>> Phone: 732 828 6098; home page: http://NeilSloane.com
>>> Email: njasloane at gmail.com
>>>
>>>
>>> On Wed, Sep 16, 2015 at 4:31 PM, Charles Greathouse <
>>> charles.greathouse at case.edu> wrote:
>>>
>>>> On P1, sequences with any of the keywords (dead, recycled, allocated,
>>>> allocating) should not have an author, so A000017 is not in error. Your
>>>> example for P2 and P3 is an allocated sequence, so it should have only
>>>> %I,
>>>> %S, %N, and %K lines (which it does).
>>>>
>>>> Sequences for which all terms are in {-1, 0, 1} should have only the
>>>> first
>>>> offset number. Sequences for which the first term with absolute value
>>>> greater than 1 is not displayed only have a second offset if it is
>>>> manually
>>>> entered (but some have not been). In particular the %O line of A038219
>>>> is
>>>> correct.
>>>>
>>>> (P5) and (P6) look like serious errors -- b-files and terms should
>>>> always
>>>> agree. (Would someone look into A1 especially?)
>>>>
>>>> (P7) is technically not an error, but those b-files should probably be
>>>> removed, at least once it's verified that the terms match.
>>>>
>>>> (P8) are errors, often because files were cut off. These should be
>>>> flagged
>>>> and corrected as time allows.
>>>>
>>>> (P9) sounds like an error worth fixing (though the second part of the
>>>> offset is, in my opinion, the least important part of the sequence).
>>>>
>>>> Would you be more specific about (P10)? Which field has bad characters?
>>>>
>>>> (P11) and (P13) are worth fixing -- maybe we should canonicalize the
>>>> keywords, it could even speed up searches if coded right. I fixed your
>>>> examples.
>>>>
>>>> On (P12), I don't see a problem with the b-file for A050399. See
>>>> https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Format
>>>> for the gory details on (what I consider) the correct format of a
>>>> b-file.
>>>> Of course it would be nicer if it was formatted according to the strict
>>>> rather than the loose guidelines, but that's a relatively minor detail.
>>>> Ideally all b-file writing software would output the strict format and
>>>> all
>>>> b-file reading software would accept the loose format.
>>>>
>>>> Neil, what do you think of (P14)?
>>>>
>>>> For (P15) the keyword is documented here
>>>> https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Keywords
>>>> and is quite rare, as you noticed. This isn't a bug. You might also see
>>>> the
>>>> super-rare keyword allocating from time to time.
>>>>
>>>> >  I would be particularly interested to reach anyone responsible for
>>>> data
>>>> quality of the database -- but I don't know if such a role even exists,
>>>> and
>>>> if so, who to contact.
>>>>
>>>> Coincidentally I was just considering the possibility of a QA process
>>>> for
>>>> the OEIS recently. Nothing exists at the moment though.
>>>>
>>>> Charles Greathouse
>>>> Analyst/Programmer
>>>> Case Western Reserve University
>>>>
>>>> On Wed, Sep 16, 2015 at 12:28 PM, Sidney Cadot <sidney at jigsaw.nl>
>>>> wrote:
>>>>
>>>> > Hello all,
>>>> >
>>>> > Over the past two weeks or so I have obtained the OEIS data to prepare
>>>> > for automated searches for connections between sequences, in the hope
>>>> > of finding hitherto undiscovered relations. I have fetched both the
>>>> > internal data for each sequence entry, and its associated b-file.
>>>> >
>>>> > First order of business has been to organize and parse all data. In
>>>> > the process, I have discovered a number of issues in the OEIS data,
>>>> > with various levels of seriousness.
>>>> >
>>>> > Below is a list of 15 issues currently recognized by my data parser.
>>>> >
>>>> > - The first (parenthesized) number is a problem ID, chosen by me.
>>>> > - The second number is an occurrence count (out of +/- 262000
>>>> entries).
>>>> > - Following that is a description of the issue.
>>>> > - Following that are one or two sequence numbers that demonstrate the
>>>> > issue.
>>>> >
>>>> > (P1)   1844  missing %A directive.
>>>> >           A000017
>>>> > (P2)    660  missing %O directive.
>>>> >           A237988
>>>> > (P3)     660  empty %S line.
>>>> >            A237988
>>>> > (P4)    456  ill-formatted %O directive (must be %O n,m)
>>>> >           A038219
>>>> > (P5)    330  %S/%T/%U / b-file mismatch
>>>> >           A000001. A001130
>>>> > (P6)    221  first-index mismatch (%S/%T/%U vs b-file)
>>>> >           A000001, A000199
>>>> > (P7)     41  %S/%T/%U has more values than b-file
>>>> >           A002235
>>>> > (P8)     31  non-sequential indices in b-file
>>>> >           A049532, A064215
>>>> > (P9)     28  %O magnitude claim (second number) incorrect according to
>>>> > b-file    A013596
>>>> > (P10)     23  Bad characters in directive data (e.g. ligature ff, fl).
>>>> >            A007054, A007283
>>>> > (P11)     21  Duplicate keyword in %K directive
>>>> >            A003184
>>>> > (P12)     18  Parse error in b-file
>>>> >            A050399
>>>> > (P13)      9  Empty keyword (or trailing comma) in %K directive
>>>> >            A032556
>>>> > (P14)      2  Bad format for %I directive
>>>> >            A005254, A006809
>>>> > (P15)      1  Undocumented keyword in %K directive ('probation')
>>>> >            A247556
>>>> >
>>>> > Some of these issues occur mostly or exclusively in recent entries,
>>>> > and could be expected to be resolved in time: e.g. P2, P3.
>>>> >
>>>> > Many issues are quite minor and mostly syntactic, e.g. P1, P10, P11,
>>>> > P13, P14, P15.
>>>> >
>>>> > The remaining seven issues P4, P5, P6, P7, P8, P9, P12 are perhaps a
>>>> > bit serious, since they reflect places where the primary sequence data
>>>> > (i.e., the numbers), and/or the offset metadata (indicating where the
>>>> > sequence starts) is not 100% clear.
>>>> >
>>>> > I would be happy to help in resolving these issues but I am not sure
>>>> > how to approach that -- going through the default workflow is  a bit
>>>> > daunting given the number of individual issues. I would be
>>>> > particularly interested to reach anyone responsible for data quality
>>>> > of the database -- but I don't know if such a role even exists, and if
>>>> > so, who to contact.
>>>> >
>>>> > Any suggestions would be welcome.
>>>> >
>>>> > Kind regards,
>>>> >
>>>> >   Sidney Cadot
>>>> >
>>>> > _______________________________________________
>>>> >
>>>> > Seqfan Mailing list - http://list.seqfan.eu/
>>>> >
>>>>
>>>> _______________________________________________
>>>>
>>>> Seqfan Mailing list - http://list.seqfan.eu/
>>>>
>>>
>>>
>>
>


More information about the SeqFan mailing list