[seqfan] Re: evil crawlers (123people and the like)
Joerg Arndt
arndt at jjj.de
Tue Jun 29 11:54:07 CEST 2010
There are several such sites
(123people_com, pipl_com, yasni_com).
The contents are generated by crawling the web.
See
http://randominternet.blogspot.com/2009/07/123people-illegal-scraping-and.html
http://randominternet.blogspot.com/2009/07/123people-dot-com-is-stealing-content.html
to get the picture.
If the name of the crawler could be identified
we can block them (if they heed robots.txt).
Ah, here we go: on the (German) web page
http://www.joomla-blog.net/webmaster-blog/personensuche-123people-pro-contra.html
we find:
-------- robots.txt ---------
User-agent: MyOnID
Disallow: /
User-agent: 123People
Disallow: /
User-agent: Pipl
Disallow: /
User-agent: Yasni
Disallow: /
-------- robots.txt ---------
I strongly suggest to enter these lines to
http://oeis.org/robots.txt
(and the corresponding AT&T file).
Wait a second, we have:
-------- robots.txt ---------
User-agent: *
Disallow: /
Disallow: /w/
Disallow: /wiki/Special:Search
Disallow: /wiki/Special:Random
-------- robots.txt ---------
We certainly do not want to block _all_ search engines, do we?
That is, I suggest to remove the line
Disallow: /
Btw. our fine Fritzl friends are:
% whois 123people.com
gives:
[owner-c] fname: Helga
[owner-c] lname: Bernold
[owner-c] org: 123people
[owner-c] address: Stronsdorf 24
[owner-c] city: Stronsdorf
[owner-c] pcode: 2153
[owner-c] country: AT
[owner-c] state: Austria
[owner-c] phone: +43-664-4398603
[owner-c] fax: +43-2526-6710
[owner-c] email: domains at 123people.com
[admin-c] fname: Martin
[admin-c] lname: Stemeseder
[admin-c] org: 123people
[admin-c] address: Linke Wienzeile 8/29
[admin-c] city: Wien
[admin-c] pcode: 1060
[admin-c] country: AT
[admin-c] state: AT
[admin-c] phone: +43-664-4398603
[admin-c] fax: +43-2526-6710
[admin-c] email: domains at 123people.com
* Jaume Oliver i Lafont <joliverlafont at gmail.com> [Jun 29. 2010 10:30]:
> Hello all,
>
> This message is not about numbers, but about misuse of the OEIS from a
> third party, regarding personal data privacy.
>
> There is something that appears to get e-mails from , changes (AT) by
> @ and then publish the result on the web.
>
> My example is at http://www.123people.com/s/jaume+oliver
>
> I assume they get the data from here because of the result when
> clicking on the e-mail adress.
>
> Checking for other contributors gives similar results.
>
> I complained and received a response that did not match the complaint;
> chances are they did not even read my text.
>
> Regards,
> Jaume Oliver
>
>
> _______________________________________________
>
> Seqfan Mailing list - http://list.seqfan.eu/
More information about the SeqFan
mailing list