[seqfan] Re: Broken link hunt
georg.fischer at t-online.de
Sun Aug 2 21:33:59 CEST 2020
though it is very desirable, this is a Sisyphean task.
In 2009 we repaired several hundreds of broken links,
and beginning of 2019 I made another attempt for
a big broken link action, but abandonned after weeks.
I still have a list of some 380 host addresses which
are not accessible (many of the host/~name URLs are
What I can provide rather easily is a complete list
of all URLs referred to in the OEIS and pointing to
some site outside. Then you would not need to crawl
on the OEIS server (and cause load on it), but simply
check links to the outside world (you must obey robots.txt
rules - some sites block if you don't).
The problem is not so much to detect the broken links, but
- to decide whether there is already a replacment link
in parallel, and
- to find out wether
. there is a simple replacement and the old link is obsolete,
. there is some replacement, but the old link should be kept,
. there is a replacement in the Internat archive (wayback machine),
. no replacement can be found,
- and then to edit the replacements in many sequence
which have the same broken link.
We did the latter from time to time in specific cases,
often with the aid of the author (target link owner).
In all cases we should ask resp. involve Neil before
we attempt big repair actions. The editing work should
be splitted among several editors, and we need a workflow
Best regards - Georg
Am 02.08.2020 um 20:41 schrieb michel.marcus at free.fr:
> I think your file.txt should have the OEIS link line, to be able to search for the link title on the web.
> And the A_number, to know where the corrected URL must be entered.
> ----- Mail original -----
> De: "Elijah Beregovsky" <elijah.beregovsky at gmail.com>
> À: "Sequence Fanatics Discussion list" <seqfan at list.seqfan.eu>
> Envoyé: Dimanche 2 Août 2020 18:14:27
> Objet: [seqfan] Broken link hunt
> Hi, Seqfans!
> Everyone knows that there are loads of rotten links in the OEIS. For the
> past couple of days I've been trying to locate and fix as many as I can.
> But then my father suggested I automate this process, so I did exactly
> that. I made a (not very sophisticated) crawler that finds and stores in a
> file all links throwing Error 404. (
> https://github.com/BIGfoot496/OEIS-crawler) After approximately an hour of
> searching it returned a file with over a hundred links (in attachment).
> That's definitely not all of the dead links and I'm going to run the code
> for a much longer time, but this is already too much work for me to do it
> alone. Let's fix them!
> PS: I wouldn't reject coding help, because the crawler isn't nearly optimal
> yet. It only catches 404s and slows down significantly after working for
> some time.
Dr. Georg Fischer, Rotteckring 19, D-79341 Kenzingen
Tel. (07644) 913016, +49 175 160 7788, www.punctum.com
More information about the SeqFan