Monday, January 09, 2006

Bad Bots Bad Bots, What Ya Gonna Do?

Looks like more of them hiding now trying to slither thru my site under the radar and it's just not working because my new visitor control panel, aka radar detector (hehe), let's me see this activity at a glance without breaking a sweat.

Today's idiot bot-du-jour was much like yesterday's, a more well behaved variety of slow crawling scraper masked as a human, but stepped in the spider trap and >SNAP!< lost an IP address.

What I'm starting to think is this is just too much fuss and a better idea might be to just break all the navigation for spiders by converting everything to javascript navigation and then supplying authorized robots the normal version of navigation in <noscript> tags.

The only issue that needs to be resolved is whether or not the search engines would deem this cloaking as the content both the search engine and the end user sees would be identical, only the technology of the navigation would change based on who requested the page.

The other idea I'm tossing about is to simply insert a captcha randomly after so many pages views so that the bot would just be stopped dead in it's tracks opposed to a human that would type in the text and continue on their path. Humans rarely ever get into hundreds of page views and interjecting captcha's after about every 40 pages over and over would pretty much bring bots to a screaming halt.

Of course there are blow-thru captcha tricks where scrapers ask humans on other sites to enter the captcha data needed to get past these traps, but setting a series of random traps with random code on random page names might just make it too hard for the scrapers to accurately identify a captcha and they'll simply download hundreds of captcha pages instead of content.

More crazy ideas coming as this episode of web warfare evolves.


baraqyal said...

You could go to a pure flash site. Seems like that's the big trend in web design.

IncrediBILL said...

No fucking way -
I'm not trying to make it completely useless, just hard to scrape

Did my logo in flash and that even makes me want to puke