Thursday, April 17, 2008

Picmole, Yet Another Spybot!

There must be good money spying on everyone because it seems a new company springs up almost weekly trying to claim their stake in this new gold rush.

How many fucking spybots do we need?

Today on the spybot circuit the we're serving up a helping of Picmole that's using heritrix to do it's crawling. Surprisingly it still checks robots.txt but who knows if they'll honor it down the road because honoring robots.txt conflicts with accomplishing their stated goals.

Identifying their spider properly and crawling from easily identifiable IPs will also present them problems as their activities increase but being a new service they'll soon figure that out and probably go stealth like all the rest. [] requested 1 pages as "Mozilla/5.0 (compatible; heritrix/1.12.0 +"
Sorry, but your bot hit a firewall on your first attempt.

Abort, Retry, Ignore?

Favcollector Bandwidth Waster

Here's another product of Canada doing the stupidest shit ever seen, collecting favicons.

It came and grabbed my icon, then hit the home page which the bot blocker promptly stopped, so who the knows what else it would've done beyond that. [] "Favcollector/2.0 ("
From their FAQ:
Favcollector is a spider that searches the internet for favicons. It downloads and stores these favicons for each site it visits. It will go back once a month to see if the favicon has changed and will download the new icon if it is has, effictivly creating an archive of all favicons on the internet.

Spider my ass...

Spiders ask for robots.txt files, read them, and go away.

Not this piece of shit as it just comes and it takes what it wants without regard to the webmasters wishes.

Not only that, a bunch of trademarked icons are now on their site without permission which will most likely make some crazed trademark enforcers start jumping up and down once they find that site.

BTW, run a damn spell checker on your site as the word is effectively, not "effictivly" unless that's the Canadian spelling.

Canasasearchbot For Canasians, Oh Canasa!

It's hard to resist commenting on a bot that can't even spell it's own name or it's country name correctly. [] "canasasearchbot("
However they got it right on their robots page:
User-agent: canadasearchbot
It did ask for robots.txt but who knows if it was looking for "canasasearchbot" or "canadasearchbot", total crap shoot.

I tried their little search engine and it took it a really long time to come back with some really bad results.

Here's a "search tip", try searching your log file and examine what your crawler is putting in that log file before turning it loose on the world.

Nothing like that fine Canadian quality, eh?

Monday, April 14, 2008

Mozshot Tries Taking a Screenshot

Yet another Firefox-based screen shot tool hit my other site today just in time to take a screen shot of an error message telling them they weren't allowed to take screen shots without permission.

Details: []
"Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628;"
This thing appears to be open source, oh joy...