The basic concepts I'm using to catch rogue bots and scrapers are simple:
- Bogus pages installed with fake links for bots to follow that well behaved bots will ignore
- Tracking frequency of page access to detect rapid downloads
- Tracking total pages accessed in a 24 hour period to detect a volume of downloads
- Some secret herbs and spices I won't divulge
The neat thing is I don't have to put a permanent ban on the rogue bots that get trapped as they'll just fall into another honeypot when they return or switch to a different proxy server.
This should be a very interesting experiment and I'll keep you all posted after testing it for a weekend to see how effective it is - maybe I'll set up a honeypot service if this works!
No comments:
Post a Comment