[seqfan] Re: Trying to use the author field of the oeis for FindStat

Rubey Martin martin.rubey at tuwien.ac.at
Tue Jun 2 20:58:38 CEST 2015


Dear Robert!

> Basically the problem is that the entries are written by many different
> people over many years, and we are not always diligent about following
> formatting rules.

Yes, and I was actually quite surprised that it works so well.  After all, I
didn't find so *many* "bad" author fields

> It might be best if you could make your software be more
> flexible at parsing names and dates.

> Anything between "_"'s is an author. Month names (in all their possible 
> variants) are pretty easy to recognize; a four-digit integer from 1990 to 
> the current year is a year; a one or two digit integer before or after a 
> month name is a day. Anything else can probably be ignored, as a first 
> approximation.

Well, I tried for quite a long time, however, I couldn't come up with
anything at least modestly robust.  Note that there are many sequences
where the author is not between underscores, and is not followed by an
email adress in parenthesis.

What I would appreciate most is to have some (flexible) format for the
author field, together with the willingness to "correct" a specific author
field (one at a time) if it does not comply.

So, what I'd suggest is:

author separator author separator ... separator date

where separator is "," or " and " or ";"
and author is "name" or "name (email)" or "_name_"
and date may be preceded or followed by a parenthetical remark.

Of course, it would be important that no author or date contains a
separator.

Best,

Martin


More information about the SeqFan mailing list