Thursday, February 15, 2007

The Gigablast to Google Content Connection

Some of my "scraped" content kept showing up in places not expressly authorized to have my content. This was making me a little batty as I'm pretty sure the bot blocker wasn't letting these people through, that my code didn't have holes like swiss cheese, then I figured it out. Finally there was a clue embedded in some of the data, as it included one of my tracking bugs, and it turns out the data originated from Gigablast.

Knowing it came from Gigablast, I looked up Gigablast's list of partners and VOILA! there was the site in question listed in their partner list.

Now comes the dilemma of what to do about this situation as I'm not happy with a couple of their partners and by allowing Gigablast, I'm permitting the partners access by default.

Worse yet, Google indexes the Gigablast data that's present in their partner sites, like Eurekster, so here you are competing with your own content in Google yet again via the Gigablast connection.

Since I really don't get any noticeable traffic from Gigablast or any of their partners, maybe it's time to cut the umbilical cord just to keep my own information from being used against me to rank their partner sites in Google.

Looks like we need some robots.txt commands that we can use to tell search engines like Gigablast it's OK to index, but not share with Snap for instance.

Maybe implement something like this in robots.txt for search engine partner control:

User-agent: Gigabot
ShareDeny: SNAP
ShareDeny: Eurekster
It's feels almost as bad, if not worse, than battling a scraper but this time I let this one in the front door with my blessings.

To block or not to block, THAT is the question...

4 comments:

Anonymous said...

Hi bill did you Block,and what were the results.I am getting suspicious of outbotfox,just doing some search and discovered your wonderfull blog,a bookmarked fav now:)
I remember your strong knowledge weight and helpfull posts on webmaster world.

IncrediBILL said...

Thanks Mick.

For the moment I'm not blocking them, but I did run this issue by someone at a search engine to see what their thoughts were on indexing resold indexed content.

I'm keeping an eye on the problem for now to see if I end up seriously competing with my own information on other sites, so far it's minor.

SFX Performance said...

Hiya Bill,
Curious to see if you finally blocked Eurekster? I've been battling against my own content on them for some time and I'm pretty confident that their results are harming mine in google.

Have you blocked them yet?

IncrediBILL said...

I haven't because the only way to get out of Eurekster seems to be blocking Gigablast's bot.