Here's a real sneaky scraper using distributed IPs that is using a bot that almost appears designed to fly under my bot blockers radar. No single IP address accessed enough pages or did anything obnoxious enough to set off any triggers but the collective accesses set off a proximity alarm and they got nailed anyway.
The scraper is pretending to be Firefox for Linux:
http://www.Mozilla/5.0 (X11; U; Linux i686; en-US; rv:18.104.22.168) Gecko/20060124 Firefox/22.214.171.124/The range of IP's noticed in this scrape attack are as follows:
126.96.36.199 ip-65-38-102-138.hou.vericenter.comThe host information is as follows:
OrgName: VeriCenter, Inc.Athough the attack seems to be centered on the 188.8.131.52/24 block at the Houston datacenter of Vericenter, I think I'm going to completely block Vericenter as it doesn't appear to have any ISP facilities [ie. NO HUMANS] and see if anything else bounces off the bot blocker from their facilities.
Address: 757 N Eldridge Parkway
NetRange: 184.108.40.206 - 220.127.116.11