Saturday, February 25, 2006

Cleverly Masked Bots Evolving

It would appear that the war over my content has been cranked up a level as bots masking as browsers and modifying their behavior to appear like people seems to be escalating. There are still some tell-tale signs that are easy to spot when you look at the server log but a couple of them that the bot blocker didn't catch are finding ways to game the system.

I didn't want to make the site more difficult for visitors but the only way to stop these guys would appear to be tossing in more random challenges like captchas and such after a pre-determined number of pages. To stop the typical captcha blow-thrus the challenges are very random and nobody could program a way to bypass them all as you don't know what they all are and I can add new ones daily if I wanted.

There's also something I noticed which isn't earth shattering but only humans seem to use my javascript menu which is a HUGE tell. Robots navigate the text links only but humans love those drop down lists and that's a clear sign that differentiates the two of them most of the time.

At the end of the day, it's just like trying to secure money in a bank, no matter how hard you try someone is going to rob you eventually but the best you can hope for is to make the number of times you get robbed as minimal as possible without pissing off all your customers in the process.


Jon said...

Your battle with bots reminds a lot of what it must be like day in and day out for Google trying to tweak their algo to thwart spammers.

I can only hope that you can come up with a decent solution that doesn't hurt the user experience. If you do, it would make for an excellent presentation at pubcon.

IncrediBILL said...

Thanks Jon.

So far I'm busting so many thousands of pages it's incredible and probably only nailed a couple of real users as people with Firefox that try to use my RSS FEED with "Open In Tabs" get a kick in the pants after the 20th page.

BTW, tried your SEO Analyzer tool for giggles and my bot blocker doesn't allow Lynx.

02/25/2006 00:00:00 BAD_AGENT "Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.7e" "/"

OOPS! ;)