In the badly behaving corporate bots dept. we offer Netsweeper as our newest entry from Canada. They run one of those content filtering companies that thinks they should be allowed to crawl your site no matter what just to protect their clients.
Sorry, but we happen to disagree with all these content filtering spiders that feel the need to crawl without any regard for robots.txt and we really don't need a whole buttload of content filtering companies scanning the fucking web.
Yes, I threw in the word fucking just so your asshole spider will flag this post as bad content so none of your goddamn customers can read this so blow that out your ass.
Let's see what Netsweeper runs:
22.214.171.124 "webcollage/1.127"These IP addresses have the following host names:
126.96.36.199 "NutchCVS/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html; email@example.com)"
188.8.131.52 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
184.108.40.206 -> firewall.net-sweeper.comLet's just cut thru the chase and here's the information to block their ass:
220.127.116.11 -> host227.net-sweeper.com
CustName: NetsweeperTa ta Netsweeper, you've been blocked and swept under my rug.
Address: 4-512 Woolwich Street
NetRange: 18.104.22.168 - 22.214.171.124