Saturday, April 21, 2007

Gigablast Data Trail

While following where my data goes on the internet I found a couple of sites that appear to be using data from Gigablast which include eWoss and searchEstate.

The upside is fewer crawlers as they're leveraging existing data crawls in multiple locations.

The downside is that you have no control where your information shows up so the only way to control that relationship is block the source.

Update: Also found Gigablast content in Webled as well.

4 comments:

Unknown said...

You sure do seem to be obsessed with controlling how people use stuff you've created, even after it's left your hands. How would you like it if GM told you how you could drive the car you bought from them, and on what roads?

Anonymous said...

What data? Your blog?

IncrediBILL said...

a) No, not my blog, stop being silly.

b) It's a matter of both disclosure and reputation management when the lesser crawlers you might allow access don't actually disclose where your data goes. I'm just reporting where I see it, and I may or may not block them from my site if that's what it takes to stop my content being associated with sites that I don't like.

Anonymous said...

We went noarchive because we were tired of being scraped through Google and Yahoo.

We turned off access to anything except the big 4 search engines because we were tired of being scraped by "search engines" like Gigablast or Genieknows.