Thursday, April 17, 2008

Picmole, Yet Another Spybot!

There must be good money spying on everyone because it seems a new company springs up almost weekly trying to claim their stake in this new gold rush.

How many fucking spybots do we need?

Today on the spybot circuit the we're serving up a helping of Picmole that's using heritrix to do it's crawling. Surprisingly it still checks robots.txt but who knows if they'll honor it down the road because honoring robots.txt conflicts with accomplishing their stated goals.

Identifying their spider properly and crawling from easily identifiable IPs will also present them problems as their activities increase but being a new service they'll soon figure that out and probably go stealth like all the rest.

208.109.189.127 [ip-208-109-189-127.ip.secureserver.net.] requested 1 pages as "Mozilla/5.0 (compatible; heritrix/1.12.0 +http://www.picmole.com)"
Sorry, but your bot hit a firewall on your first attempt.

Abort, Retry, Ignore?

2 comments:

Protect Your Content said...

yep its boring created system stops all these bots at page one.

Einav said...

Hi Bill,

My name is Einav Itamar and I am the CTO of PicMole.com
I would like to let you (and everybody) know that we will always respect robots.txt - Politeness is the base of good crawling...
Additionally, we post our mail address within the HTTP headers, so website admins can explicitly exclude themselves from our list.

Best,
Einav Itamar
http://picmole.com