Saturday, November 17, 2007

Don't Just Block Spam, Block Spammers Too!

Most modern blog anti-spam efforts are based on just protecting the comment forms which is a very narrow focus. When some spambot or someone posts something bad it's automatically trapped and discarded by tools like Askimet. However, I don't think this solution goes far enough to solve the problem as it only puts a band-aid on the comments page.

What I'm going to suggest, which I recently did to a few of my sites, is to go a step beyond just the comments page and punish bad behavior with banishment.

Why not ban the spammer?

You've trapped the spam and you know he/she/it is up to no good so why let them continue to access your site at all?

What if tools like Askimet not only blocked the spam but locked the spammer out of every site running Askimet worldwide?

If Askimet and a bunch of the other anti-spam tools could pool their spammer data then you could effectively block them from ever accessing any website ever again.

Now THAT's how you punish a spammer, ban him from the worldwide community!

This is not a new concept as RBL lists have been used for things like this in the past as spammers IP's were not only used to block incoming mail but added to the server firewall as well. However, the more recent web-based technologies have tended to be very narrow focused and missed the bigger opportunity to thwart problem spammers in a better way such as ACCESS DENIED to the web in general.

Consider that many modern well protected websites that are cranking up security block access from data centers and proxy servers leaving spammers few options besides direct residential connections and botnets. Assuming spammers might rent out botnets it would have to be hijacked residential PC's since servers from blocked data centers won't do them much good being often blocked already. Therefore, assuming spammers were forced to use botnets to do their bidding, they would unwittingly block innocent people that would shortly discover their machines are infected and get them fixed.

What a concept!

Ostracizing spammers could even get people with compromised PC's off the botnet too!

Spammers would think twice about ever spamming again if each attempt permanently cost them more and more access to the web so maybe, just maybe, we can end spam in our lifetime just by changing the anti-spam technology being deployed as a complete front-end security system for the website after the comment form triggers the alarm and alerts the entire anti-spam community.

OK, there could be a few innocent casualties but the greater good to permanently eradicate spam and even botnets completely outweighs the impact of a little friendly fire.

I'm banning spammers to clean up the online environment, how about you?

Friday, November 16, 2007

Microsoft Crawling with Perl Script?

Wonder what the boys in Redmond are up to using Perl instead of one of their beloved Microsoft languages?

131.107.0.96 [tide526.microsoft.com.] requested 6 pages as "libwww-perl/5.805"
131.107.0.95 [tide525.microsoft.com.] requested 6 pages as "libwww-perl/5.805"

Makes you go Hmmmm...

Thursday, November 15, 2007

That Rant Wasn't About Anal Sex!

My heart warming Christmas rant from last year entitled "Good Will Toward Men but FUCK WOMEN DRIVERS" has almost ranked in the top 10 for anal sex under #8 for 'but fuck'.

Ah well, one "T" short of major porn affiliate ads running on the site.

Maybe next year we'll be blessed with a 'butt fuck' ranking.

Sigh, until then I can only dream of free porn money....

Sunday, November 11, 2007

Attributor Post-Mortem Copyright Compliance Revisited

My first post about the emergence of Attributor was about a year ago and I thought it was time to review and see what we've learned since then.

Here's where they've crawled from that we've spotted:

63.209.14.55 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Proxy VIA=1.1 ind27.attributor.com:3128 FORWARD=10.50.40.74

63.209.14.10 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Proxy VIA=1.1 ind25.attributor.com:3128 FORWARD=10.50.40.74

209.51.152.146 "Attributor.comBot"

66.231.188.172 "Attributor.comBot"

63.209.14.53 "Mozilla/5.0 (compatible; dejan/1.13.2 +http://www.attributor.com)"

63.209.14.7 "Mozilla/5.0 (compatible; dejan/1.13.2 +http://www.attributor.com)"

Now the amusing part is the IP 209.51.152.146 as it's a proxy and it appears they aren't any smarter than the rest of the bots as 340+ crawls have come via that IP this year including msnbot, Googlebot, Twiceler, Gigabot, Snapbot and some others so you're in fine company with other stupid crawlers out there.

What's curious is that 66.231.188.172 is one of Gigablast's IPs, and some of the others may be as well but they resolve to Level 3 blocks as do other Gigablast IPs, but I didn't look hard enough to confirm, lazy I guess.

Now let's examine one of my favorite statements on their website:
...you will no longer have to hold back top content or impose technical barriers on its viewing; instead, quality content can be made more easily available to a larger number of consumers.
Excuse me?

My technical barriers [not used on this blog] stop the problem in the first place just so I don't need to pay anyone to go chasing my content around the billions of pages on the web. As a matter of fact, my technical barriers are what trapped your crawl attempts above and identified what IP's your bots were using. That means your technology can't get past my technology so you'll never know if I'm stealing anyone's material but I'm pretty sure you aren't stealing my bandwidth finding out.

So now you have to ask yourself which method is easiest to stop content theft, blocking data centers and bulk downloaders on the fly or scanning billions of web pages looking for theft after the cows have already left the barn?

Bot blocking wins hands down as it's more cost effective without a doubt.

The best part is if someone wants to license your content you'll get 100% of the profits and not share with some company that wants to chase around the vast wasteland of the web looking for violators.

Maybe Attributor has some other places they crawl from without the user agent identifying the source, but that just means the bot blocker will stop and quarantine some anonymous IP address and we may never know it's really them.

Doesn't matter, I'm still banking on proactive content theft prevention technology and not reactive technology as it's easier to keep your cows at home when the fences are all closed and patrolled in the first place than try to round 'em up later.