Friday, March 17, 2006

Spyveillance, Block 'em if you got 'em

OK, this must be a clue that my bot blocker has graduated to the head of the class as I've snared 2 coporations bypassing security measures within 24 hours pretending to be browsers.

Remember what I said about bot blocking being an onion that you keep peeling layer by layer?

The next one in our list of sneaky snoopers is Cyveillance, which apparently has been around for a while but went silently unnoticed until I cranked up the level of bot profiling on my site just a bit to see if I was missing anyone and BINGO! got 2 big fish in a day looking at the next layer of the onion.

According to what I've been reading at linuXgod's site, these boys spy for the RIAA, government and god knows who else or for what purposes. He's been trying to get them to stop crawling his site via a small back and forth of emails and they don't seem to be interested in complying.

My favorite quote is where they justify ignoring internet standards like robots.txt and mask the user agent string as a browser ""Mozilla/4.0 (compatible; MSIE 6.1; Windows XP)".

Because many sites use redirection pages to route robots to special "indexing" pages, we identify our web crawler as an IE browser to ensure it receives the same content as the majority of web surfers on the internet and to allow our programmers to concentrate on a single interpretation of thehtml standard.
Well hell, doesn't that logic just make it fucking OK to ignore whether I want your robot on my server in the first place?

So you're justified in bypassing my security to stop browsers just to concentrate on a single html standard?

Well guess what, NO, YOU'RE NOT JUSTIFIED!

Here you go people, the range of IPs so block them as we're not being given any other means to detect this crawler:
whois 63.148.99.239

Cyveillance QWEST-63-148-99-224 (NET-63-148-99-224-1)
63.148.99.224 - 63.148.99.255

and...

CYVEILLANCE UU-65-213-208-128-D4 (NET-65-213-208-128-1)
65.213.208.128 - 65.213.208.159
Wish I had the bot blocker commercialized now to go mainstream and nail this nonsense.

Corporate Crawler Masking as MSIE

Well, color me stunned shocked and appalled as I ran into an actual real live corporation with a legitimate product that is deploying a crawler that sets the user agent as MSIE ""Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; ....)".

Yeah, that's right, forget robots.txt, forget letting you block them by normal user agent filtering means, they're getting into your website whether you like it or not because they have MANIFEST DESTINY!

They are ENTITLED TO YOUR CONTENT!

Not.

These lovely sneaky snoopers that boldly bypass your firewalling efforts are Lightspeed Technologies and they appear to be operating from this IP range 66.17.15.128 - 66.17.15.191.

Just block them now as this is about the lowest I've seen a corporate crawler get and they should be blocked on principle alone by not honoring internet standards.

Thursday, March 16, 2006

Terms of Service vs. Fair Use

Here's my next thought about how to combat ill behaved spiders that include snippets from your website and claim fair use. Include something in your TERMS OF SERVICE or LEGAL page on your website that prohibits unauthorized robots.

Therefore, even if they are within their rights of fair use they've violated your terms of service and you possibly have an actionable item on your hands.

Thinking about running this one past a lawyer as we need some boilerplate text like the GNU license that can be distributed and used everywhere as leverage against scrapers.

Film @ 11

Tuesday, March 14, 2006

Related Movie Bullshit

If my last rant just moments ago about sucky movies didn't get the point across, I was just reading some entertainment news and J Lo may star in a movie adaptation of Dallas and Ice Cube may star in the big screen version of Welcome Back Kotter.

Dallas?

Welcome Back Kotter?

You must be shitting me!

IT'S OBVIOUS WHY PEOPLE DON'T GO TO THE FUCKING MOVIES YOU MORONS!

Looks like I'll need to take up golf or some shit since I obviously won't be at the movies anymore.

Hollywood Blames Piracy Instead of SUCK MOVIES

This rant is rated TV-MA for fucking language.

Pay attention Hollywood, just sit down and listen the fuck up, it's not piracy that's stopping movie goers from watching films and buying DVD's, it's the long stream on non-stop shit you've been cranking out this year stopping people from wasting their money. In case you missed it the first time, listen up you shithead demographic chanting morons, it's your BULLSHIT MOVIES keeping my money in my pocket, not piracy, not video rentals, not On Demand, not cable TV. Don't put the blame on anything but your garbage product as it's ALL YOU, nothing else, causing your decline.

My wife and I used to go see movies 1 or 2 times a week and in the last year we've been having a real hard time finding anything worth wasting money on 1 time a month so obviously we're stealing shit instead of paying to watch shit according to your theory just because you couldn't make a movie worth 2 thumbs up your ass most of the year.

Other than Walk the Line, Matchpoint, Syriana, and Good Night, And Good Luck plus a few others I can't remember at the moment the choices have been real fucking slim this year.

Nothing fun stood out like that anticipated sequel to the First Wives Club those bastards shelved, give me Goldie, Bette and Diane before I get pissed or one of them has a stroke! Nor did they show anything outstanding like American Beauty or being a Nicholson fan we could use more films like Something's Gotta Give, About Schmidt and As Good As It Gets and I could care less if Jack stars in them either, just good quality movies you can watch!

Not to mention the massive vacuum of any real stand-out superhero flicks in a while but the new Superman is on the way, supposedly. Nor have we seen any good SciFi / space epics in a loooooong while but TV's Friday night SciFi line-up is kicking their asses anyway so if you decide to make a new space movie it better kick ass like the original Alien or be fun like Starship Troopers, something special because people running around a rusty bucket ship shooting each other over some fucking conspiracy in a low budget space movie is BOOOORING.

I like to go out, I like a night at the movies, I'd go twice a week easily, but I refuse to ruin my night watching any old shit you think I'll pay for because you're WRONG WRONG FUCKING WRONG so make a good movie or just shut the fuck up and go broke already.

BTW, while I have your attention, spread out the goddamn movie times you assholes. People are still getting off work and want to have dinner when all the movies are starting at 7pm and most people move on and do other things before the next showing at 10pm. It's stupid, it's always been stupid, it will continue to be stupid and you can lose more business until you stagger showtimes a bit more so working slobs can see movies at 8pm and 9pm which is more reasonable.

So BLAME PIRACY when you continue to MAKE SHITTY MOVIES and continue with showtimes on CRAPPY SCHEDULES so.... FUCK YOU, FUCK YOUR PROFITS and STOP FUCKING WHINING YOU RICH HOLLYWOOD GOLD-PLATED GATED MANSION LIVING MOTHER FUCKERS just FUCKING FUCK YOU!

Just make me some good movies and we won't have a talk again, ok?

Monday, March 13, 2006

This Means War

While I was out to lunch this afternoon some nitwits that I've banned over and over trotted out a new IP address and tried to scrape 1,000 pages when nobody was watching.

Sorry pals, someone WAS watching, it was my little silent sentinel buddy that I wrote myself that blocked your ass after about 20 pages and sent you a nice whopping 900+ pages of error messages.

I think I've had enough of your shit though and perhaps it's time we see what ABUSE@SCRAPERHOSTING.COM has to say about your repeated attacks on my server.

Hope you get shut down or at a minimum find yourself in bed with Lorena Bobbitt and wake up with a Frankendick.

Sunday, March 12, 2006

Knuckle Scraping Neanderthal

When a scraper reads your robots.txt file don't you think they would avoid the disallowed pages and directories?

Then would you believe the scraper reads your robots.txt file a SECOND time after just downloading a few pages and immediately opens the page that it's told to leave alone and WHAMMO! gets stopped.

How FUCKING STUPID can you be to write such brain damaged code?

Chitika ContentHit IPs

Chitika took another shot at my server with the user agemt "Chitika ContentHit 1.0" but this time tried a whole bunch of IPs on a single web page, one that Chitika doesn't even appear on which was most amusing.

  • 67.15.219.3
  • 67.15.219.11
  • 67.15.219.18
  • 67.15.219.14
  • 67.15.219.15
  • 67.15.219.10
  • 67.15.219.9
  • 67.15.219.16
  • 67.15.219.17
So there you have it, let 'em in block their ass at your leisure.