OK, this must be a clue that my bot blocker has graduated to the head of the class as I've snared 2 coporations bypassing security measures within 24 hours pretending to be browsers.
Remember what I said about bot blocking being an onion that you keep peeling layer by layer?
The next one in our list of sneaky snoopers is Cyveillance, which apparently has been around for a while but went silently unnoticed until I cranked up the level of bot profiling on my site just a bit to see if I was missing anyone and BINGO! got 2 big fish in a day looking at the next layer of the onion.
According to what I've been reading at linuXgod's site, these boys spy for the RIAA, government and god knows who else or for what purposes. He's been trying to get them to stop crawling his site via a small back and forth of emails and they don't seem to be interested in complying.
My favorite quote is where they justify ignoring internet standards like robots.txt and mask the user agent string as a browser ""Mozilla/4.0 (compatible; MSIE 6.1; Windows XP)".
Because many sites use redirection pages to route robots to special "indexing" pages, we identify our web crawler as an IE browser to ensure it receives the same content as the majority of web surfers on the internet and to allow our programmers to concentrate on a single interpretation of thehtml standard.Well hell, doesn't that logic just make it fucking OK to ignore whether I want your robot on my server in the first place?
So you're justified in bypassing my security to stop browsers just to concentrate on a single html standard?
Well guess what, NO, YOU'RE NOT JUSTIFIED!
Here you go people, the range of IPs so block them as we're not being given any other means to detect this crawler:
whois 18.104.22.168Wish I had the bot blocker commercialized now to go mainstream and nail this nonsense.
Cyveillance QWEST-63-148-99-224 (NET-63-148-99-224-1)
22.214.171.124 - 126.96.36.199
CYVEILLANCE UU-65-213-208-128-D4 (NET-65-213-208-128-1)
188.8.131.52 - 184.108.40.206