Saturday, January 31, 2009

Iterasi Archives Sites Without Permission

Guess what boys and girls?

There's another wonderful new site that allows people to copy your shit without your permission!

Iterasi allows their members to "archive" individual web pages.

The pages on my site have a meta tag "NOARCHIVE" which tells everyone DO NOT ARCHIVE this page yet they archived it anyway. They also stripped out my frame busting javascript so they are seriously thwarting sites at every turn that don't want to participate in their tool.

Being that Iterasi is in Beta maybe I'll cut them a little slack, very little, but just a bit.

On their web site it says:

At iterasi, we love the Web. So much so, that we want to keep it. Forever.
If you really love the web you would follow standard web protocols and if the webmaster gives you permission, fine, do whatever you want.

For those of us that don't allow it, back the fuck off.

Here's the IP and user agent details:
198.145.117.78

"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; SLCC1; .NET CLR 2.0.50727)"
They operate out of this IP range:
OrgName: Infinity Internet, Inc.
NetRange: 198.145.0.0 - 198.145.255.255
CIDR: 198.145.0.0/16
Infinity Internet is a mixed service with both hosting and business/residential DSL services so blocking the whole range probably isn't safe.

The reverse DNS shows:
pointer ip78.117.colo.iinet.com.
For the time being, you can opt-out of Iterasi by blocking anything with an RDNS containing ".colo.iinet.com" which seems to stop them dead in their archiving tracks.

Here's a few things Iterasi could do so webmasters don't get hostile:
  • Honor robots.txt
  • Honor meta tags like NOARCHIVE
  • Provide a user agent string that identifies Iterasi accessing a site
  • Provide reverse DNS so we can tell it's your company and not a spoof
Until that time, I have you permanently blocked and I'm sure others will soon .

7 comments:

Jonas said...

I love your rants. You are always faster than me finding the bad ones. Thanks.

EDELBABE said...

Well, we have just send this e-mail to them:

"Ladies and gentlemen,

it has come to our attention that you indexed/grabbed our website.

Due to the nature of your robot (it fakes itself to be a regular webbrowser by using a fake user agent string), we doubt that your company is trustworthy.

Furthermore, you do not provide information on how to block your robot in a friendly way (see the W3C web standards) using a robots.txt entry or any other way. This also implies that your company is "up to no good".

Anyway, we herewith give you the one-time opportunity to remove the indexed data of www.edelbabe.net from your archives and to stop robot-parsing the site immediately.

As an alternative, you can provide information about blocking your robot using the regulat robots.txt file.

If that also fails, a third and final alternative would be to provide all the IP numbers your robot might be using, so we can block those.

Should you not cooperate in any of the three alternatives we provide and continue to parse our website, we will hand over this issue to our lawyers.

Have a nice day!

webmaster@edelbabe.net"

Anonymous said...

Did you ever get an answer? We didn't. We like our privacy

Anonymous said...

Iterasi is far from new.

IncrediBILL said...

I hadn't seen it before or blogged about it so it was new to me and a new blog post which had no bearing on whether Iterasi itself was technically "NEW".

Anonymous said...

Any word back about this? I just found our sites in there via search engines. This can't be legal...

Anonymous said...

Ah, but how do you block effectively without a collateral damage? Infinity Internet, like so many others, offer a variety of services, including residential Internet access. Wouldn't blocking their whole range be an over-kill? Anyone has figured out a more "gentle" approach?

I wish there was some sort of agreement/requirements that hosting companies identified their colo- and hosting ranges. That would go a long way in curbing online abuses.