10/06/08 - A day that will live in internet infamy when a prominent internet company caught millions of webmasters off guard and sent shockwaves around cyberspace.
The event that caused this uproar was the launch of Linkscape with supposedly 30 billion pages indexed that stunned even the most savvy webmasters because they didn't see it crawling and were totally taken by surprise.
This new snooping SEO tool is billed as "An Index of the World Wide Web – 30 billion pages (and growing!), refreshed monthly" which has left webmasters that are already battered and abused by a massive onslaught of automated bots more angry than ever.
The internet entitlement mentality thinks that all webmasters have unlimited bandwidth and CPU and that anything that's online should just be taken without regard to the consequences.
Webmasters will no longer tolerate Indexation Without Representation and are moving to regain control over their sites, their content, and their competitive intelligence. Many webmasters that previously called bot blockers paranoid draconian control freaks are now crying for solutions to high profile marauders raiding their sites and reaping large profits. Now that the tide has turned the webmasters are preparing for the revolution with new sites such as the NoArchive Initiative, better bot blocking scripts, honey pots and much more.
Even a competitive site called MajesticSEO which provides a similar product actually gives Free multiple page reports on your domains if you register and prove you own the site which is at least a symbiotic relationship and not completely parasitic.
However, not only doesn't Linkscape give anything back to the webmaster for allowing your site to be crawled, or mined for competitive intelligence, they actually increased the price $30 to access their tools so you actually have to pay more for the privilege of being crawled to see your own data!
New Pricing, featuring three levels of PRO membership depending on the size and needs of our members. Current PRO members need not worry - you'll be grandfathered in at the current price level. We're just creating two new echelons for those who need access to more. If you'd like to lock in at the current price level ($49/month), I won't stop you :-)So what benefit do we all get from all this?
For a monthly fee our competitors can see everything we do in infinite detail.
Wow! That's a benefit?
Sorry, but that just doesn't play homey.
8 comments:
Uhm... what's their netblock again?
Will this see it off, or do I need more?
User-agent: dotbot
Disallow: /
This bot is worse than a plague. I've seen it in three different formats since August. It was having a seriously deleterious effect on my site until I blocked it.
DotBot/1.0.1 (http://www.dotnetdotcom.org/#info, crawler@dotnetdotcom.org)
DotBot/1.0.1 (http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)
Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)
ros: In my experience it does not obey robots.txt so, while that's the correct format, the bot will just ignore it and crawl your site anyway. You need stronger measures and that really sucks.
If all those webmasters didn't notice Linkscape's crawling, then Linkscape's crawling can't have impacted them very much, CPU or bandwidth wise, can it have? Which means Linkscape "took" nothing from them, cost them basically nothing, but may drive them some traffic.
Those who did notice a deleterious effect from their bot have apparently blocked it.
I don't see why you're making a big deal out of this.
You're confusing the lack of being able so sort Linkscape out of the onslaught of crawlers, visible and stealth, with the fact that it had impact.
For instance, if I crawled your site using a list of 1K proxy IPs, I could easily download 10K pages in a hurry and have an impact but it would look indistinguishable from normal traffic.
The fact that it crawled and nobody knew it was them indicates something odd like that possibly happened.
Read the post highlighted as "battered and abused" in that article and it'll give you a clue about the scope of the noise for some of us.
"If I crawled your site using a list of 1K proxy IPs, I could easily download 10K pages in a hurry and have an impact but it would look indistinguishable from normal traffic."
Not if I don't normally get traffic at those levels. A big bolus of traffic on top of normal levels, with no apparent slashdotting (I'd look for a major blog or similar site in the referrer logs, of course), would suggest to me some sort of distributed-but-too-damn-fast crawl.
Identifying and blocking the source would be a trickier matter of course, but knowing that something had happened would be a cinch unless the added traffic was lost in the general noise, which on the other hand would mean the added traffic was harmless.
("Harmless" by an ordinary definition, mind you, not by your definition where any automated traffic, no matter at how low a level or whether or not it ever leads to any plagiarism, is inherently "harmful" unless it drives Google levels of traffic.)
@Anonymous
"Linkscape "took" nothing from them, cost them basically nothing, but may drive them some traffic."
Do you have some rough figures on traffic conversion from competing SEOs? ;)
Just to update though, it seems there is no Moz crawler after all. They have mashed data from several data sources and APIs. At least that clears up the UA issue, so thanks to them for that.
More can be found here: http://sphinn.com/story/79700
"Do you have some rough figures on traffic conversion from competing SEOs? ;)"
No, but then "competing" is not the same thing as "cheating" and is not wrong.
Post a Comment