Saturday, November 10, 2007

Websense Stealth Crawler Bypassing Security?

What I find amusing are security companies that claim to be protecting the web while violating access control measures on web servers all over the world.

Here's what I see coming from WebSense that's obvious:

208.80.193.29 Mozilla/5.0 (compatible; Konqueror/3.0-rc2; i686 Linux; 20020108)
208.80.193.30 Mozilla/5.0 (compatible; Konqueror/3.0-rc4; i686 Linux; 20020418)
208.80.193.33 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312462)
208.80.193.34 Mozilla/5.0 (compatible; Konqueror/3.1; i686 Linux; 20020213)
208.80.193.36 Mozilla/5.0 (compatible; Konqueror/3.0-rc1; i686 Linux; 20020328)
208.80.193.37 Mozilla/5.0 (compatible; Konqueror/3.1-rc4; i686 Linux; 20020520)
208.80.193.41 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312466)
208.80.193.42 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312462)
208.80.193.51 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312460)
208.80.193.52 Mozilla/5.0 (compatible; Konqueror/3.0-rc6; i686 Linux; 20020204)
It makes me wonder if deliberately trying to bypass security measures in place that are designed to keep robots like WebSense off a server, such as robots.txt, .htaccess and other access controls, may violate the "Computer Hacking and Unauthorized Access Laws"?

Proving they've been busily sneaking around on lots of servers won't be too hard either.

Maybe WebSense should just claim any site that blocks them is off limits since we don't want them on our servers instead of trying to circumvent our security measures.

That would make too much sense wouldn't it?

Of course someone could claim that bad sites would just cloak clean content if they know it's WebSense. However, I'd rather give explicit permission for WebSense and then it wouldn't bother me so much if they crawled in stealth from a different IP address knowing that I gave permission in the first place.

Here's some of their known IP ranges:
Websense 66.194.6.0 - 66.194.6.255
Websense 74.211.167.208 - 74.211.167.215
Websense, Inc 208.80.192.0 - 208.80.199.255
Not sure these are the same company as a couple are in Canada and the other is in a different city, but what the heck, make up your own mind on these:
Websense Inc 67.117.201.128 - 67.117.201.143
Websense Systems Inc. 64.69.80.104 - 64.69.80.111
Websense Systems Inc. 64.69.80.96 - 64.69.80.103
There you go, some good bot blocking to go with your morning coffee should start off a fine Monday!

6 comments:

Johann said...

Nice list, you are missing 204.15.64.0/21 and I was missing 74.211.167.208.

I blogged about WebSense before in Corporate web abuse: The worst offenders from Cyveillance to PicScout

proxy said...

Just add their Ip range to your .htaccess and give them 403

-regards

Anonymous said...

Websense is the dumbest thing on the Web since Npbot, RuleSpace and all of these other crazy companies that promise things that they can't keep.

As you've implied in your entry, for any company to "deliver" on their promises, they would need to get around security measures deployed by most webmasters to keep abusive companies like Websense off their sites. If companies that pay for Websense's services think that they can deliver on their promises must be living in another dimension.

FYI Websense uses all types of user agent strings to bot sites. They also use something called Google Web Accelerator and various IP's from Europe, North America (including from ^38.)Latin America and Asia to abuse your site.

If you have evidence that Websense has violated user agreements or attempted to bypass security measures (including hacking) to obtain web content, you have the right to contact an attorney, as well as the U.S. Department of Justice. The USDOJ has a computer crime section and if you have the data against Websense (especially hacking attempts), I don't see why they wouldn't listen to what you have to say.

However, the main point I want to make is that people and companies who pay for services like this are totally misinformed about the capabilities of a company like Websense. The joke is on all of them.

Anonymous said...

Websense appears to have intensified its efforts to gain access to my sites using so many different user agent strings and multiple hits a day, filling my logs with 208.80.193.XXX that it must be time for conditional logging.

Yes, Websense, that little tool that will take all hits from your company (that specifically violates my copyright on file with the U.S. Copyright Office) and place them in a special little Websense file marked "hell" that will be deleted every 24 hours like clockwork.

Yes Websense, your hits will not be recorded and it will be, well, like you are not really there at all. I will miss seeing several of your stupid user agent strings since several of them gave me a good laugh each day.

For kicks now I will check in, from time to time, to look at WBSN on the boards.

I bid your company lots of luck in an extremely demanding economic environment.

John said...

I'm impressed how persistent WebSense is when it comes to stroking my web site. I get tens of hits a day with every UA different from the other.
They seem to like MSIE and Konqueror UAs.
Where's the loving for FireFox?
At the moment I just log all their hits and then drop the connection with an HTTP error.
And my site is a WML/XHTML-MP site and it's like WebSense is sooo going to add to my client base....not.

agrif said...

robots.txt was never intended as access control, and in fact robotstxt.org says specifically that listing paths in robots.txt is a great flag for malware. The file is only supposed to be used as a recommendation. Basically, web crawlers are evil if they ignore it, but it is by no means illegal.

Like stated before, use some form of server-side authentication or filtering for anything you actually need access control for.