[seqfan] Re: Conservation of web pages linked to from OEIS

Brendan McKay Brendan.McKay at anu.edu.au
Fri Oct 11 02:26:33 CEST 2013


A good fraction of pages on the web can't be adequately archived just by
saving the html file initially linked to, and there might not even be one.
You have to save images as well, and active elements (that use a program
running on the server to compute content) are more or less impossible to
archive. The fraction of web pages generated directly from a html file

decreases every year.

You could instead consider archiving an image of the linked page, rather
than its html content.  That won't handle web sites with complex
structure, but it will permanently record what first appeared when the
link was clicked.  Presumably there are tools for this.

Even without that problem, I frankly can't see this ever getting more
than partial coverage, due to the labour of asking permission.  As far
as I know permission is not legally required for archiving, and especially
not for archiving the page image (otherwise Google is in billions of
troubles).
I suggest you instead create an opt-out system and automatically archive
external links as they are added.


Cheers,
Brendan.

>------------------------------
>Message: 6
>Date: Tue, 8 Oct 2013 15:15:15 -0400
>From: Neil Sloane <njasloane at gmail.com>
>To: Sequence Fanatics Discussion list <seqfan at list.seqfan.eu>
>Subject: [seqfan] Conservation of web pages linked to from OEIS
>Message-ID:
>	<CAAOnSgTrXEt6wO0Qan_r=i1G26G_Jx3wp9SE0O3SEokF36_OfQ at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>I just added a note to the
>http://oeis.org/wiki/Suggested_Projects page on the wiki
>about this, but this is important enough that I'm sending out an expanded
>version to the list.
>
>In 50 years most of the present links to personal web pages won't work any
>more, unless we have a backup copy on the OEIS web site.
>
>   1. So we should start making copies of all the private web pages that
>we
>   link to from the sequence entries. If possible we try to get permission
>   first, of course, but in many cases it may already be too late for
>that,
>   and we will have to use the Wayback Machine to recover the page.
>   What I recommend is that we have a duplicate for each link, like
>this:J.
>   Smith, <a href="etc etc, [the original link]J. Smith, <a
>   href="a123456.***> [cached copy]
>   For an example where we have a successfully cached copy, see
>A213000<http://oeis.org/A213000>
>   .For an example of an unsuccessful cache, see
>A080104<http://oeis.org/A080104>
>   .
>
>Furthermore, Hugo P. has pointed out to me that many publishers seem to
>have changed their policy, and articles that were formerly free are now
>hidden behind pay walls. So that links that used to work don't any more.
>I'm not sure how to deal with that problem.
>
>Anyway, there are a LOT of links in the OEIS, so this is something where
>everyone can help. This "crowd-sourcing" may be the only
>way to solve the problem.
>
>If you see a link in A123456, say, to
>J. Smith, <a href="http://homepage.com/file.html">Title</a>
>
>then ask permission from J. Smith, explaining why we are doing this
>(basically because we hope the OEIS will be around for hundreds of years,
>and so it is to everyone's advantage to preserve these pages),
>
>and then make a copy called a123456.html
>(it should start with the A-number, with lower case a)
>
>edit A123456 to upload a123456.html,
>and create a link saying
>J. Smith, <a href="a123456.html">Title</a> [Cached copy]
>
>The same thing with jpg, gif, pdf, etc files.
>
>Of course if there are subsidiary files called by the page,
>you need to copy them too. If you have questions, ask me or David
>Applegate
>or Russ Cox or Charles Greathouse for help.
>
>Neil





More information about the SeqFan mailing list