[seqfan] Re: Inconsistencies in OEIS

Sidney Cadot sidney at jigsaw.nl
Thu Sep 17 00:15:10 CEST 2015


Hi,

> On P1, sequences with any of the keywords (dead, recycled, allocated,
> allocating) should not have an author, so A000017 is not in error.

I got my information from:

  https://oeis.org/eishelp1.html

... which says in bold red letters which fields are required: %I, %N,
%A, %O, %K.

But I noted already that this information is probably outdated, as it
also prescribes the %V, %W and %X lines which are no longer used, and
states that %S, %T, %U are non-negative.

If there is more authoritative information available (such as the rule
you give above), I'd be happy to incorporate it in my parser/checker.
I'd also suggest that it would be useful if the old 'help' pages would
indicate their staleness and contained a link to more up-to-date
information.

> Your example for P2 and P3 is an allocated sequence, so it should have only %I,
> %S, %N, and %K lines (which it does).

Ok. I will update my parser once the rules are clear to me, and
re-run. (In particular, do you have a reference for the rule you give
here, and other rules that apply?)

> Sequences for which all terms are in {-1, 0, 1} should have only the first
> offset number. Sequences for which the first term with absolute value
> greater than 1 is not displayed only have a second offset if it is manually
> entered (but some have not been). In particular the %O line of A038219 is
> correct.

Ok. Is this convention documented?

> (P5) and (P6) look like serious errors -- b-files and terms should always
> agree. (Would someone look into A1 especially?)

Ok. There are ~ 330 such cases -- many simply stemming from different
first-index conventions, but others with egrogious errors, e.g.
A001130, A061212, A104128, A131393/4/6/7, A175854, A177866/7/8,
A180605, A181993, A184163, A184667, A184956, A185210, ...

> (P7) is technically not an error, but those b-files should probably be
> removed, at least once it's verified that the terms match.

Ok.

> (P8) are errors, often because files were cut off. These should be flagged
> and corrected as time allows.

Ok.

> (P9) sounds like an error worth fixing (though the second part of the
> offset is, in my opinion, the least important part of the sequence).

Ok.

> Would you be more specific about (P10)? Which field has bad characters?

This occurs in the freeform 'informational' fields: %N, %C, %D, %H,
%F, %e, %p, %t, %o, %Y, %A, %E. I made an exhaustive list of the
Unicode characters occurring for each directive. There are many
different characters used, and people use different conventions
(sometimes Unicode, sometimes TeX, sometimes HTML entities). There are
also Unicode control sequences present that are probably mistakes.

For now I just flag the following:

- the ligature characters: 'ff', 'fi', 'fl'
- fullwidth colon character ':' (0xff1a)
- fullwidth colon character ';' (0xff1b)

Especially the liguture characters are a bit nasty since they hamper
searchability. For example:

A209251, %N: "reflection" uses an 'fl' ligature which is a single character).

There's about 30 instances of this. But there are many more issues
concerning the letters used, and the descriptions in general.

> (P11) and (P13) are worth fixing -- maybe we should canonicalize the
> keywords, it could even speed up searches if coded right. I fixed your
> examples.

Ok,

> On (P12), I don't see a problem with the b-file for A050399.

Entry 2890 has this:

2890     \45650

(but of course this is just an example -- there's 18 b-files that have
some kind of syntactic issue.)

> Neil, what do you think of (P14)?

As background: those two %I directives are markedly different than all
others. To be precise: all other %I lines match one of the following
regular expressions:

            "N[0-9]{4}$",
            "M[0-9]{4}$",
            "M[0-9]{4} N[0-9]{4}$",
            "M[0-9]{4} N[0-9]{4} N[0-9]{4}$"

... whereas ..

A005254's %I line is "%I M2779 and M2780"
A006809's %I line is "%I M2796 = M2797"

It's no big deal, but it's also easy to homogonize so all lines will
match: "[MN][0-9]{4}( [MN][0-9]{4})*$" which would be fully
homogeneous.

> For (P15) the keyword is documented here
> https://oeis.org/wiki/User:Charles_R_Greathouse_IV/Keywords
> and is quite rare, as you noticed. This isn't a bug. You might also see the
> super-rare keyword allocating from time to time.

Yes, but these keywords are not documented in what for me (as an
outsider) looks like the authorative help pages:

  https://oeis.org/eishelp1.html
  https://oeis.org/eishelp2.html

If you can confirm that your list is authoritive (or, as close to
authoritive as we have), then I can adapt my parser to emit less false
positive warnings.

> Coincidentally I was just considering the possibility of a QA process for
> the OEIS recently. Nothing exists at the moment though.

I'm willing to assist, if extra hands are needed.

I can also make full lists of all generated issues available. For now,
I do want to make sure that my parser implements the best possible set
of rules. Apart from the Wiki pages that you have pointed out, are
there any other pages I should be aware of?

Regards
  Sidney



More information about the SeqFan mailing list