[seqfan] Re: evil crawlers (123people and the like)

Joerg Arndt arndt at jjj.de
Tue Jun 29 11:54:07 CEST 2010


There are several such sites
(123people_com, pipl_com, yasni_com).
The contents are generated by crawling the web.

See
 http://randominternet.blogspot.com/2009/07/123people-illegal-scraping-and.html
 http://randominternet.blogspot.com/2009/07/123people-dot-com-is-stealing-content.html
to get the picture.

If the name of the crawler could be identified
we can block them (if they heed robots.txt).
Ah, here we go: on the (German) web page
 http://www.joomla-blog.net/webmaster-blog/personensuche-123people-pro-contra.html
we find:
-------- robots.txt ---------
User-agent: MyOnID
Disallow: /

User-agent: 123People
Disallow: /

User-agent: Pipl
Disallow: /

User-agent: Yasni
Disallow: /
-------- robots.txt ---------

I strongly suggest to enter these lines to
 http://oeis.org/robots.txt
(and the corresponding AT&T file).

Wait a second, we have:
-------- robots.txt ---------
User-agent: *
Disallow: /
Disallow: /w/
Disallow: /wiki/Special:Search
Disallow: /wiki/Special:Random
-------- robots.txt ---------

We certainly do not want to block _all_ search engines, do we?
That is, I suggest to remove the line
Disallow: /



Btw. our fine Fritzl friends are:
% whois 123people.com
gives:

[owner-c] fname:             Helga
[owner-c] lname:             Bernold
[owner-c] org:               123people
[owner-c] address:           Stronsdorf 24
[owner-c] city:              Stronsdorf
[owner-c] pcode:             2153
[owner-c] country:           AT
[owner-c] state:             Austria
[owner-c] phone:             +43-664-4398603
[owner-c] fax:               +43-2526-6710
[owner-c] email:             domains at 123people.com


[admin-c] fname:             Martin
[admin-c] lname:             Stemeseder
[admin-c] org:               123people
[admin-c] address:           Linke Wienzeile 8/29
[admin-c] city:              Wien
[admin-c] pcode:             1060
[admin-c] country:           AT
[admin-c] state:             AT
[admin-c] phone:             +43-664-4398603
[admin-c] fax:               +43-2526-6710
[admin-c] email:             domains at 123people.com




* Jaume Oliver i Lafont <joliverlafont at gmail.com> [Jun 29. 2010 10:30]:
> Hello all,
> 
> This message is not about numbers, but about misuse of the OEIS from a
> third party, regarding personal data privacy.
> 
> There is something that appears to get e-mails from , changes (AT) by
> @ and then publish the result on the web.
> 
> My example is at http://www.123people.com/s/jaume+oliver
> 
> I assume they get the data from here because of the result when
> clicking on the e-mail adress.
> 
> Checking for other contributors gives similar results.
> 
> I complained and received a response that did not match the complaint;
> chances are they did not even read my text.
> 
> Regards,
> Jaume Oliver
> 
> 
> _______________________________________________
> 
> Seqfan Mailing list - http://list.seqfan.eu/




More information about the SeqFan mailing list