Wednesday, December 28, 2005

The Great Anti-Scrape-Off

Previous posts have mumbled about my new spider-trap anti-scraper tool that I added to my web site and I must say it's working so well I'm contemplating converting it into PHP so the masses can play with it.

Don't hold your breath as I'm fundamentally lazy.

The features are as follows:

  • Fast crawl auto-block
  • Slow crawl detection and optional blocking
  • Spider trap auto-block
  • Webmaster control panel that shows last 15 minutes of live activity by visitor
  • Manual ban/block from the control panel
  • Allowed spider pass thru with built in passes for Google, Yahoo and MSN
Overall it has put the skids on 99.9% of the scrapers and off topic bots crawling my site within 2 days of being fully deployed. Previously I was just monitoring what it was doing, or going to do, in order to make sure it wasn't zapping legitimate spiders and visitors and now it seems to be ready for prime time so it was set to LIVE a couple of days ago with fingers crossed.

New enhancements I'm working on for next week aren't spider related but visitor related. These new 'sensors' will detect additional information per visitor such as cookies, javascript and banner blocking being used so I can remove old hacks that do some of this and make a centralized visitor knowledge base.

The final features in the visitor knowledge base will allow me to dynamically deploy the appropriate advertising model that each visitor can view when the second page is loaded, or theoretically allow me to start interjecting intermission pages to become a "subscriber".

So far so good, will keep everyone posted as to the effectiveness of the anti-scraper tools.

