[seqfan] Re: Inconsistencies in OEIS

Thu Sep 17 17:20:57 CEST 2015

You're right that eishelp1 is outdated. I don't think there's a new page
which describes the internal format; perhaps I will write one.

> There are ~ 330 such cases -- many simply stemming from different first-index
conventions, but others with egrogious errors

I should mention that Neil and I have both run this sort of check on
b-files before. Errors crop up from time to time. :/

> For now I just flag the following:
>
> - the ligature characters: 'ﬀ', 'ﬁ', 'ﬂ'
> - fullwidth colon character '：' (0xff1a)
> - fullwidth colon character '；' (0xff1b)

Good idea! Yes, these should probably be scrubbed.

> If you can confirm that your list is authoritive (or, as close to authoritive
as we have), then I can adapt my parser to emit less false positive
warnings.

Yes. My page
https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Keywords
is the closest thing we have to an authoritative reference on keywords, and
my page
https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Format
is the closest thing we have to an authoritative reference on the b-file
format.

Charles Greathouse
Analyst/Programmer
Case Western Reserve University

On Wed, Sep 16, 2015 at 6:15 PM, Sidney Cadot <sidney at jigsaw.nl> wrote:

> Hi,
>
> > On P1, sequences with any of the keywords (dead, recycled, allocated,
> > allocating) should not have an author, so A000017 is not in error.
>
> I got my information from:
>
>   https://oeis.org/eishelp1.html
>
> ... which says in bold red letters which fields are required: %I, %N,
> %A, %O, %K.
>
> But I noted already that this information is probably outdated, as it
> also prescribes the %V, %W and %X lines which are no longer used, and
> states that %S, %T, %U are non-negative.
>
> If there is more authoritative information available (such as the rule
> you give above), I'd be happy to incorporate it in my parser/checker.
> I'd also suggest that it would be useful if the old 'help' pages would
> indicate their staleness and contained a link to more up-to-date
> information.
>
> > Your example for P2 and P3 is an allocated sequence, so it should have
> only %I,
> > %S, %N, and %K lines (which it does).
>
> Ok. I will update my parser once the rules are clear to me, and
> re-run. (In particular, do you have a reference for the rule you give
> here, and other rules that apply?)
>
> > Sequences for which all terms are in {-1, 0, 1} should have only the
> first
> > offset number. Sequences for which the first term with absolute value
> > greater than 1 is not displayed only have a second offset if it is
> manually
> > entered (but some have not been). In particular the %O line of A038219 is
> > correct.
>
> Ok. Is this convention documented?
>
> > (P5) and (P6) look like serious errors -- b-files and terms should always
> > agree. (Would someone look into A1 especially?)
>
> Ok. There are ~ 330 such cases -- many simply stemming from different
> first-index conventions, but others with egrogious errors, e.g.
> A001130, A061212, A104128, A131393/4/6/7, A175854, A177866/7/8,
> A180605, A181993, A184163, A184667, A184956, A185210, ...
>
> > (P7) is technically not an error, but those b-files should probably be
> > removed, at least once it's verified that the terms match.
>
> Ok.
>
> > (P8) are errors, often because files were cut off. These should be
> flagged
> > and corrected as time allows.
>
> Ok.
>
> > (P9) sounds like an error worth fixing (though the second part of the
> > offset is, in my opinion, the least important part of the sequence).
>
> Ok.
>
> > Would you be more specific about (P10)? Which field has bad characters?
>
> This occurs in the freeform 'informational' fields: %N, %C, %D, %H,
> %F, %e, %p, %t, %o, %Y, %A, %E. I made an exhaustive list of the
> Unicode characters occurring for each directive. There are many
> different characters used, and people use different conventions
> (sometimes Unicode, sometimes TeX, sometimes HTML entities). There are
> also Unicode control sequences present that are probably mistakes.
>
> For now I just flag the following:
>
> - the ligature characters: 'ﬀ', 'ﬁ', 'ﬂ'
> - fullwidth colon character '：' (0xff1a)
> - fullwidth colon character '；' (0xff1b)
>
> Especially the liguture characters are a bit nasty since they hamper
> searchability. For example:
>
> A209251, %N: "reﬂection" uses an 'ﬂ' ligature which is a single character).
>
> There's about 30 instances of this. But there are many more issues
> concerning the letters used, and the descriptions in general.
>
> > (P11) and (P13) are worth fixing -- maybe we should canonicalize the
> > keywords, it could even speed up searches if coded right. I fixed your
> > examples.
>
> Ok,
>
> > On (P12), I don't see a problem with the b-file for A050399.
>
> Entry 2890 has this:
>
> 2890     \45650
>
> (but of course this is just an example -- there's 18 b-files that have
> some kind of syntactic issue.)
>
> > Neil, what do you think of (P14)?
>
> As background: those two %I directives are markedly different than all
> others. To be precise: all other %I lines match one of the following
> regular expressions:
>
>             "N[0-9]{4}$",
>             "M[0-9]{4}$",
>             "M[0-9]{4} N[0-9]{4}$",
>             "M[0-9]{4} N[0-9]{4} N[0-9]{4}$"
>
> ... whereas ..
>
> A005254's %I line is "%I M2779 and M2780"
> A006809's %I line is "%I M2796 = M2797"
>
> It's no big deal, but it's also easy to homogonize so all lines will
> match: "[MN][0-9]{4}( [MN][0-9]{4})*$" which would be fully
> homogeneous.
>
> > For (P15) the keyword is documented here
> > https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Keywords
> > and is quite rare, as you noticed. This isn't a bug. You might also see
> the
> > super-rare keyword allocating from time to time.
>
> Yes, but these keywords are not documented in what for me (as an
> outsider) looks like the authorative help pages:
>
>   https://oeis.org/eishelp1.html
>   https://oeis.org/eishelp2.html
>
> If you can confirm that your list is authoritive (or, as close to
> authoritive as we have), then I can adapt my parser to emit less false
> positive warnings.
>
> > Coincidentally I was just considering the possibility of a QA process for
> > the OEIS recently. Nothing exists at the moment though.
>
> I'm willing to assist, if extra hands are needed.
>
> I can also make full lists of all generated issues available. For now,
> I do want to make sure that my parser implements the best possible set
> of rules. Apart from the Wiki pages that you have pointed out, are
> there any other pages I should be aware of?
>
> Regards
>   Sidney
>
> _______________________________________________
>
> Seqfan Mailing list - http://list.seqfan.eu/
>