[seqfan] Re: Broken link hunt

jean-paul allouche jean-paul.allouche at imj-prg.fr
Sun Aug 2 21:44:35 CEST 2020


Dear all

The discussion about broken links gave me a --possibly stupid-- idea.
As you know archive.org saves (some) sites, /but/ also permits to save
an existing site. A possibility could be, when saving a link on the OEIS,
to save it on archive.org and to also put that link on the oeis. Of course
this makes twice as much work, but this is certainly better than to do it
for a huge lots of links afterwards. The possible problem I see is whether
archive.org would ultimately accept such a huge number of individual savings
of links.

jean-paul


Le 02/08/2020 à 21:33, Georg.Fischer a écrit :
> Hi Elijah,
>
> though it is very desirable, this is a Sisyphean task.
>
> In 2009 we repaired several hundreds of broken links,
> and beginning of 2019 I made another attempt for
> a big broken link action, but abandonned after weeks.
> I still have a list of some 380 host addresses which
> are not accessible (many of the host/~name URLs are
> endangered).
>
> What I can provide rather easily is a complete list
> of all URLs referred to in the OEIS and pointing to
> some site outside. Then you would not need to crawl
> on the OEIS server (and cause load on it), but simply
> check links to the outside world (you must obey robots.txt
> rules - some sites block if you don't).
>
> The problem is not so much to detect the broken links, but
> - to decide whether there is already a replacment link
>   in parallel, and
> - to find out wether
>   . there is a simple replacement and the old link is obsolete,
>   . there is some replacement, but the old link should be kept,
>   . there is a replacement in the Internat archive (wayback machine),
>   . no replacement can be found,
> - and then to edit the replacements in many sequence
>   which have the same broken link.
>
> We did the latter from time to time in specific cases,
> often with the aid of the author (target link owner).
> In all cases we should ask resp. involve Neil before
> we attempt big repair actions. The editing work should
> be splitted among several editors, and we need a workflow
> control mechanism.
>
> Best regards - Georg
>
>
> Am 02.08.2020 um 20:41 schrieb michel.marcus at free.fr:
>> I think your file.txt should have the OEIS link line, to be able to 
>> search for the link title on the web.
>> And the A_number, to know where the corrected URL must be entered.
>> Best.
>> MM
>>
>> ----- Mail original -----
>>
>> De: "Elijah Beregovsky" <elijah.beregovsky at gmail.com>
>> À: "Sequence Fanatics Discussion list" <seqfan at list.seqfan.eu>
>> Envoyé: Dimanche 2 Août 2020 18:14:27
>> Objet: [seqfan] Broken link hunt
>>
>> Hi, Seqfans!
>> Everyone knows that there are loads of rotten links in the OEIS. For the
>> past couple of days I've been trying to locate and fix as many as I can.
>> But then my father suggested I automate this process, so I did exactly
>> that. I made a (not very sophisticated) crawler that finds and stores 
>> in a
>> file all links throwing Error 404. (
>> https://github.com/BIGfoot496/OEIS-crawler) After approximately an 
>> hour of
>> searching it returned a file with over a hundred links (in attachment).
>> That's definitely not all of the dead links and I'm going to run the 
>> code
>> for a much longer time, but this is already too much work for me to 
>> do it
>> alone. Let's fix them!
>> Elijah
>>
>> PS: I wouldn't reject coding help, because the crawler isn't nearly 
>> optimal
>> yet. It only catches 404s and slows down significantly after working for
>> some time.
>>
>>
>




More information about the SeqFan mailing list