Sunday, November 11, 2007

Attributor Post-Mortem Copyright Compliance Revisited

My first post about the emergence of Attributor was about a year ago and I thought it was time to review and see what we've learned since then.

Here's where they've crawled from that we've spotted: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Proxy VIA=1.1 FORWARD= "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Proxy VIA=1.1 FORWARD= "Attributor.comBot" "Attributor.comBot" "Mozilla/5.0 (compatible; dejan/1.13.2 +" "Mozilla/5.0 (compatible; dejan/1.13.2 +"

Now the amusing part is the IP as it's a proxy and it appears they aren't any smarter than the rest of the bots as 340+ crawls have come via that IP this year including msnbot, Googlebot, Twiceler, Gigabot, Snapbot and some others so you're in fine company with other stupid crawlers out there.

What's curious is that is one of Gigablast's IPs, and some of the others may be as well but they resolve to Level 3 blocks as do other Gigablast IPs, but I didn't look hard enough to confirm, lazy I guess.

Now let's examine one of my favorite statements on their website: will no longer have to hold back top content or impose technical barriers on its viewing; instead, quality content can be made more easily available to a larger number of consumers.
Excuse me?

My technical barriers [not used on this blog] stop the problem in the first place just so I don't need to pay anyone to go chasing my content around the billions of pages on the web. As a matter of fact, my technical barriers are what trapped your crawl attempts above and identified what IP's your bots were using. That means your technology can't get past my technology so you'll never know if I'm stealing anyone's material but I'm pretty sure you aren't stealing my bandwidth finding out.

So now you have to ask yourself which method is easiest to stop content theft, blocking data centers and bulk downloaders on the fly or scanning billions of web pages looking for theft after the cows have already left the barn?

Bot blocking wins hands down as it's more cost effective without a doubt.

The best part is if someone wants to license your content you'll get 100% of the profits and not share with some company that wants to chase around the vast wasteland of the web looking for violators.

Maybe Attributor has some other places they crawl from without the user agent identifying the source, but that just means the bot blocker will stop and quarantine some anonymous IP address and we may never know it's really them.

Doesn't matter, I'm still banking on proactive content theft prevention technology and not reactive technology as it's easier to keep your cows at home when the fences are all closed and patrolled in the first place than try to round 'em up later.


Ban Proxies said...

I see Attributor as a validation tool for the proactive approach. Show a major content publisher how many times they have been ripped off and use the very same tool to prove to them that the "proactive approach" is superior.

I know of a character that is very vocal about a proactive approach that does work. If I were this person I would be looking at patent possiblities and/or maybe an appliance setup. Large companies like plug and play appliances.

Now let's see where were we ..........
Kick down some office doors at Venture Cap firms, who did Attributor use, prove a superior product, perfect it and wait for the likes of Cisco to come knocking.

Maybe one day I'll mention this to a character named Bill.


Just day dreaming. Great post. Gotta go and get some work done :)

cdman83 said...


I'm a long time reader (and sporadic commenter) of your blog and recently I was put in a position which could mutually help us (by hopefully reducing the number of exploits floating around the 'net). Could you please contact me at so that I can give you more details?

Best regards.

Anonymous said...

Are you still getting Attributor crawlers? I have tried to reason with the company to stop crawling my site, but they simply lie to me about what they do. And they never promise to stop crawling my site, too.