How many fucking spybots do we need?
Today on the spybot circuit the we're serving up a helping of Picmole that's using heritrix to do it's crawling. Surprisingly it still checks robots.txt but who knows if they'll honor it down the road because honoring robots.txt conflicts with accomplishing their stated goals.
Identifying their spider properly and crawling from easily identifiable IPs will also present them problems as their activities increase but being a new service they'll soon figure that out and probably go stealth like all the rest.
208.109.189.127 [ip-208-109-189-127.ip.secureserver.net.] requested 1 pages as "Mozilla/5.0 (compatible; heritrix/1.12.0 +http://www.picmole.com)"Sorry, but your bot hit a firewall on your first attempt.
Abort, Retry, Ignore?
yep its boring created system stops all these bots at page one.
ReplyDeleteHi Bill,
ReplyDeleteMy name is Einav Itamar and I am the CTO of PicMole.com
I would like to let you (and everybody) know that we will always respect robots.txt - Politeness is the base of good crawling...
Additionally, we post our mail address within the HTTP headers, so website admins can explicitly exclude themselves from our list.
Best,
Einav Itamar
http://picmole.com
@Einav Itamar, CTO PicMole.com.
ReplyDeleteYour bot hit our server today, but there was no email address in the http headers, it crawled from IP 67.202.12.250.
Alex Capo
hit us today from 174.142.82.15
ReplyDeletepaul
Hit our website today. I've checked the address, www.picmole.com but it responded with no default page. Stopping this bot now...
ReplyDeleteHey, it's 2013 and guess what ? Their crappy bot just hit our server...
ReplyDeleteThey are still using their good old user-agent: "Mozilla/5.0 (compatible;picmole/1.0 +http://www.picmole.com)"
Of course, they don't care about robots.txt...
Why should a webmaster to allow this bot?
ReplyDeleteReverse Whois: Domains By Proxy, LLC
Hit our server today.
ReplyDeleteUser agent: Mozilla/5.0 (compatible;picmole/1.0 +http://www.picmole.com)
IP address: 54.227.209.108
URL appendix: /text/javascript/ or trying to exec javascript