Monday, April 24, 2006

Crawl my site, go to jail, it's the law.

Who the hell put open season on my website?

Not talking about this blog, but the website that pays my bills and shit.

It's NOT a free articles site, plainly posted copyright notices, technology is in place to stop assholes dead in their tracks and they just keep coming.

Did someone post a sign on the internet that I've never seen that says:

The other day I wrote some dry ass bullshit about data mining and then followed it up with some boring assed statistics I've been collecting but suddenly something changed and the amount of thieves hitting the site is suddenly going off the charts compared to a month ago.

Let's put this into perspective:

If my website were a grocery store and these assholes were looters or shoplifters the shelves would be bare, the clerks would be guarding the doors with shotguns, the police would have the place surrounded and the parking lot blocked off and a riot squad would be firing rubber bullets and cracking skulls of people trying to get away with th loot.

The local jails, needless to say, would be overflowing.

I realize there is no physical theft involved like the grocery store example above, and I also realize putting something out on a public network invites a certain amount of risk, and lunatics from the fringe, but when you install barriers and roadblocks to stop that activity and they just keep coming it's beyond and above what falls under 'normal access' and is well into some serious realms of abuse and harassment.

But this is just 'copyright infringement' and 'bandwidth theft', right?

Well, maybe it is when you identify yourself as robot and behave in an acceptable way that allows me, the webmaster, to stop you with reasonable efforts. However, when you mask to conceal the nature of your visit, change the identity of the crawler, and then attempt to crawl undetected and bypass mechanisms such as firewalls put in place to stop your activity, then TECHNICALLY this becomes hacking.

Isn't hacking to gain unwanted access a FUCKING CRIME?

I'm thinking someone needs to make a test case on this and just bypass copyright altogether and try filing a criminal complaint against them for hacking and see what happens.

I have a simple case I could use to test this theory already, as initially I put simple roadbloacks in place for humans to get past just waiting to see when the bandits would code around it, and sure enough they did a few months later. Then of course I made it harder and I'm waiting to see if they'll take a shot at bypassing this roadblock as well. Don't worry, there are deadbolts I can install to keep them out if needed, but my current cat and mouse game is more fun and useful to learn from as they expose the level of their technology.

OK, since I put a lock on the door that stopped the unwelcome technology that was hitting my site and someone deliberately programmed to bypass it, isn't that technically hacking?

I'm thinking it's just about time to drop some money on a lawyer to see where this idea stands within the definition of the U.S. laws on hacking as a crime as it would be nice just to scare the shit out of the local scrapers.

Not to mention it would sure be a cool to put a logo on my site that basically explains to these assholes:

"Crawl my site, go to jail, it's the law."

Who am I kidding?

Some asshole in some foreign country would just start a whole new business selling scraping services or pre-scraped websites if they aren't doing it already.

Then we're right back to fighting copyright infringement.


Anonymous said...

IF. . . I had the resources I would pursue something like this.

Two years ago before I knew anything about scrapers I found a dozen or so networks (over 1,500 MFA sites) that had scraped my content - 100's of thousands of pages. I got hit by a huge dup page filter which is when my bot/scraper education began.

It's taken over 18 months and I'm just now starting to recover.

There was real financial damage from the theft and if I had the money I would chase them down with John Doe lawsuits and subpoenas.

To quote you Bill - "FUCK'EM"


Anonymous said...

It was several years ago that I discovered my sites had been copied en-mass and republished in Taiwan. The network there was restricted to Taiwan.. sort of like an AOL network for Taiwanese people. Each time my site updated... their copy updated a few days later. Of course they had ads all over it.

Since then i stopped looking for my web content in Taiwan. I feel much better now. (Oh, and did I mention that I started publishing subtle references to The People's Republic of Taiwan?)