Since launching my scraper stopper all sorts of little bots have been caught and categorized with the number of them simply staggering and it doesn't seem to stop with new ones popping up daily.
My initial pass at blocking bots was the automated snare to stop them in real time, then let me review it and install a permanent IP block if I wanted or simply add the crawlers' user agent string to my blacklist and block any occurance from any IP, or all of the above..
Then I had an epiphany as this blacklist approach is simply too much work as the number of bots out there is just too much for any one person to deal with sorting out and banning, even with the assistance of automatation to stop them in real-time.
My new approach which was sweet and simple was just the opposite using the whitelist approach and now all bots are initially banned with only the ones I deem worthy being added to the whitelist after the fact.
Compare the difference in the approach:
- Previously blacklisted bots in the hundreds and growing
- Current whitelisted bots less than 10