[seqfan] OEIS: what went wrong

Russ Cox rsc at swtch.com
Sun Nov 14 22:42:15 CET 2010


[This is a lot of detail for anyone who is truly interested
in what happened yesterday.  Apologies for having nothing
to do with actual integer sequences.

Executive summary: everything should be working again,
and no edits were lost.  Happy editing!  -Russ]


THE BUG

The forgotten edits were caused by a bug in my usage of the SQLite3
database library.  Each write to the library is wrapped in a
transaction to ensure that revisions proceed sequentially.  If two
people try to edit revision #3 independently, the second attempt
fails, and either the web server resolves the two edits or it shows an
editing conflict page to the person editing.

I implemented this by using the SQL SAVEPOINT command before reading
the latest entry, and then if it was too new, using the ROLLBACK TO
SAVEPOINT command to back out.  Unfortunately, I missed the fact that
while this does undo the change it does not finish the transaction: a
subsequent RELEASE SAVEPOINT command is necesary, and I didn't run it.

The effect was that after the first such edit conflict, all further
changes to the database were wrapped in that not-quite-aborted
transaction, which was never going to commit.  Restarting the server
then abandoned the unfinished transaction, making the server appear to
forget the edits since the conflict.  We were restarting the server
frequently enough in the first few days that this bad state never
lasted for long, which caused the bug to go unnoticed (or at least
unreported) at first.

I wrote a test to check the behavior of the database in this situation
and then fixed the bug by adding the necessary RELEASE SAVEPOINT
command.


RECOVERY

Although the database had discarded the edits, they were not lost.  In
addition to writing changes to the SQLite3 database, the OEIS server
records them in a separate append-only edit log.  It also records all
HTTP requests that involve editing the database in an append-only
request log.  (Append-only logs have the nice property that they are
easy to back up and very hard to screw up, because data is never
overwritten.) If something bad happens to the database, like what
happened yesterday, the append-only edit log contains sufficient
information to rebuild the database from scratch.  If for some reason
the edit log is not enough, we can fall back on the actual HTTP
requests, which contain all the information the server had when it was
making changes.

The backup mechanisms worked as intended: the database failed, but the
edit log had all the necessary information, so nothing was lost.  We
did not have to resort to the HTTP log, but we could have.

>From the time we launched the new OEIS to the time I put the server in
read-only mode last night, people made 1,906 edits.  Of those, the
transaction bug had caused the database to discard 413.  Most went
unnoticed and were trivially restored by reapplying them during the
reconstruction of the database from the edit log.

There were 180 edits affecting 92 sequences, though, that were made
after the database had forgotten earlier changes to the same sequence,
so that these independent changes had to be reconciled.  70 of those
changes, affecting 36 sequences, involved a single edit made twice,
identically (typically publishing a draft).  The remaining 56
sequences had more involved independent edits, but essentially all of
them were repeating forgotten changes, so there were no real
conflicts.

In two cases, the forgotten changes caused the A-number allocator to
give an A number to two different people.  I moved the second person's
A-number to a new one in both cases and have emailed the affected
parties to let them know about the new number.

The oeis.org server is now running with the fixed code and the
reconstructed database, with no lost edits, thanks to the separate
edit log.


AFFECTED SEQUENCES

This section lists the sequences that had non-trivial independent
changes that needed reconciling, along with the people and kinds of
changes involved.  I am listing them here mainly for transparency and
so that the people involved can check them if they wish.  All the
edits are incorporated; no action is required.

A001085 / Noe / out of sync edits
A001538 / Librandi, Noe / out of sync edits
A005818 / Sloane, Noe / duplicate edits
A029745 / Noe, Hasler / out of sync edits
A045725 / Greathouse / duplicate edits
A045726 / Greathouse / duplicate edits
A045727 / Greathouse / duplicate edits
A045728 / Greathouse / duplicate edits
A045729 / Greathouse / duplicate edits
A045732 / Greathouse / duplicate edits
A045733 / Greathouse / duplicate edits
A045813 / Greathouse / out of sync edits
A045855 / Greathouse / duplicate edits
A045856 / Greathouse / duplicate edits
A045857 / Greathouse / duplicate edits
A045858 / Greathouse / duplicate edits
A045859 / Greathouse / duplicate edits
A048460 / Pol, Sloane / duplicate edits
A065186 / Greathouse, Noe / duplicate approval
A066270 / Greathouse, Noe / duplicate approval
A066613 / Greathouse, Noe / duplicate approval
A066700 / Greathouse, Noe / duplicate approval
A066733 / Greathouse, Noe / duplicate approval
A066734 / Greathouse, Noe / duplicate approval
A071791 / Greathouse, Noe, Sloane / duplicate approval
A082381 / Greathouse, Noe / duplicate approval
A098719 / Noe / duplicate edit
A103969 / Stephan, Sloane, Brockhaus / duplicate review, approval
A104249 / Librandi, Stephan, Noe / duplicate review, approval
A104736 / Greathouse, Sloane, Noe / duplicate review, approval
A105044 / Librandi, Stephan, Noe / duplicate review, approval
A105680 / Librandi, Brockhaus / duplicate review, approval
A105775 / Librandi, Stephan, Brockhaus / duplicate edit, review, approval
A106009 / Librandi, Brockhaus / duplicate review, approval
A108596 / Librandi, Noe, Brockhaus / duplicate edit, review, approval
A108856 / Librandi, Noe, Brockhaus / duplicate edit, review, approval
A108857 / Librandi, Noe, Brockhaus / duplicate edit, review, approval
A108874 / Librandi, Noe, Brockhaus / duplicate edit, review, approval
A108899 / Librandi, Noe, Brockhaus / duplicate edit, review, approval
A108900 / Librandi, Noe, Brockhaus / duplicate edit, review, approval
A111292 / Librandi, Brockhaus, Noe / duplicate edit, review, approval
A111380 / Greathouse, Sloane, Noe / duplicate review, approval
A115390 / Stephan, Sloane, Myers / duplicate review, approval
A117619 / Librandi, Brockhaus, Noe / duplicate edit, review, approval
A124011 / Librandi, Brockhaus, Noe / duplicate edit, review, approval
A124163 / Librandi, Cox, Noe / duplicate edit, review, approval
A124185 / Librandi, Cox, Noe / duplicate edit, review, approval
A139250 / Pol / duplicate edits
A169911 / Greathouse, Noe / duplicate review, approval
A171549 / Greathouse, Stephan, Sloane / duplicate review, approval
A175419 / Greathouse, Sloane / duplicate edit, review, approval
A175424 / Greathouse, Sloane / duplicate edit, review, approval
A176028 / Noe, Greathouse / out of sync edits
A181765 / Zumkeller, Jackson / same A-number allocated twice
A181766 / Librandi, Curtz / same A-number allocated twice
A181776 / Lagneau, Noe / duplicate review, approval




More information about the SeqFan mailing list