Monday, March 06, 2006

Pathologically Extreme

Yep, that's what my bot busting obsession was called today in private email.

Now that I'm "pathologically extreme" I must thank the person for his bluntness as it did bring up the point that there's a lot more to this bot busting issue that someone sitting on the sidelines only casually familiar with my quest and this blog may know.

In all fairness, if you told me I'd be on this quest to abolish unauthorized access to my site 12 months ago I would've laughed in your face and said "what harm does a little crawling do anyway?" and yes, I used to hold those tightly wound content control freaks in low regard as misguided time wasting fools.

However, then I decided to get out of the consulting game and focus more attention just on my own web sites which bring in a decent revenue stream without all of the whining and hassles of customers.

That's when all hell broke loose as suddenly both the spammers and scrapers started hammering my old server so hard it was going down all the time. Not physically crashed mind you, but it was just so busy serving the needs of spammers and scrapers that my income needs weren't being met whatsoever. We're talking DOS attacks because of the sheer speed and volume of this nonsense and the server just didn't respond for 5, 10, 15 and the worst was 90 minutes at a shot. It got SO BAD at one point I had to completely get rid of server side spam filtering as that tool itself could use up all the CPU when some spammer came along doing a pump and dump of spam.

These shameless greedy bastards were impacting my site, my SERPs, my wallet and really pissing me off - the shit had to stop.

First was the easy part which was just getting rid of the spam. I blocked email coming from most of Asia and Russia which eliminated the majority of the high speed spam dumps and gave me some breathing room to work. Then I made the only way to contact me a form on the web sites, eliminating all email addresses but 2, and literally set the server not to BOUNCE emails but REJECT emails. Why I did this is bounce emails still come into your server and attempt to send a response back but most spam has a bogus reply address and thousands of bounce emails quickly fill up the queue and your email system grinds to a screaming halt processing bounce deliveries all day long. Trust me on this, just REJECT those undeliverable emails, no bandwidth or CPU wasted at all as they just bounce off your server harmlessly never to be seen again.

Guess what?

Asia and Russia are no longer blocked as REJECTing their emails stopped them from being a threat.

At this point there are never more than 10 emails sitting in my mail queue at any time and the spam that gets thru is literally a handful of emails a day, blissfully under control, I love it.

However, after solving this problem the old server was still going down like it was under a spam attack and after a while I came to the conclusion my site was probably just too busy to handle the load and my older slower server just couldn't deal with the demands of all the visitors, search engines, etc. and set out to upgrade.

Now, with a big fast shiny dual Xeon box it's back up and running faster than ever.

Two weeks later some fuckers took it offline for 90 minutes in the middle of the night and I lost my shit, that was it, the straw that broke the camels back, no more Mr. Nice Guy.

.... this was war....

Then the whole process kind of evolved into a huge eye opening adventure at this point and being a naturally curious guy and a programmer with a huge ego [yes, I am IncrediBill and I can stop these bastards] it kind of took on a life of it's own.

First, stopping the high speed scrapers was easy, totally childs play.

Next, the sheer volume of scraping became apparent once I was monitoring real-time site activity while squashing the high speed scrapers and looking for other unauthorized resource wasting bots.

Evolution just kept happening as one thing led to another, stopping more scrapers unearthed even more scrapers, that the errors I fed scrapers unveiled tons of sites with MY SHIT on them, and that many people had apparently built AdSense-incentivized businesses based on bottom feeding off my business and in the process were diluting keywords I was earning money from using my own content against me.

OK, now THAT pissed me off even more.

So while some of you may call the depths and extremes I'm taking to protect my shit as "pathologically extreme" my side of the story is self-defense for my very survival and I'll be damned if some bottom-feeding leeches are going to take me down without a good fucking fight.

Yes, that's it people, as far as I'm concerned at this point it's a fight to the death, theirs and not mine if I have anything to say about it. So far I've spent a ton of time and money addressing the issues and at this point it's paying off but for how long only time will tell but it's definitely going to be a death match for one of us.

Now, go buy my CD's and t-shirts in the lobby to help fight the cause and invite me as a motivational speaker at your next nerd conference to spread the word!

Nah, that would be WAY too extreme!

3 comments:

Anonymous said...

Wow, you really should sell those scripts of yours. I am sure many webmasters are tired of paying for the spammers fortunes, but don't have the knowledge to block them like this (or even detect them). Ever consider selling an eBook with how-to and scripts?

Anonymous said...

Welcome to the Information Castle Age where there be brigands and dragons and monsters oh my.

I enjoy following your exploits. Perhaps an animated series: 'Invasion of the Content Snatchers' starring IncrediBill?

I have done much the same the past few years but with two decided advantages: my name (real or nick) is not associated with particular sites nor with anti-bot behaviour.

I like a fair fight: from behind, without warning, with a 2x4.

Each days installment reads like the old matinee movie serials. Great stuff.

Anonymous said...

I'm going to start warming up my script collection. I think I still have accounts sitting across plenty of large pipes ;-).