Monday, May 12, 2008

Impact On Your Bandwidth Will Be Minimal My Ass

How often do we see that happy line of horse shit spread by every new startup that crawls the web about how minimal it's impact will be?

Every fucking one of them claim it but when you add them all together the bot traffic is quickly exceeding the human traffic.

Who the fuck am I kidding, on most sites the bots clearly out number the humans in pages read on a daily basis.

First we put the big search engines on top of the heap with Google, Yahoo and MSN crawling the crap out of your servers daily. Just the three of these guys can easily read as many pages as 10K visitors a day. Then throw in the wannabe search engines like Ask, Gigablast, Snap, Fast, etc. ad nauseam and it's over the top.

Now expand that list to include the international search engines like Baidu, Sogou, Orange's ViolaBot, Majestic12, Yodao, and on and on, tons of 'em.

Then we have all the spybots that feel entitled to crawl your site like Picscout, Cyveillance, Monitor110, Picmole, RTGI, and on and on.

Next add up all the specialty niche bots like Become, Pronto, OptionCarriere, ShopWiki, and all sorts of shit too numerous to mention.

Pile on top of this all the free fucking tools that every little shithead and make believe company uses to scrounge the 'net for god knows what, and god's not telling, like Nutch and Heritrix, plus the web downloaders, offline readers, and more.

Don't forget, many of these so-called search engines and shit now want screen shots as well so after they crawl your page they send a copy of Firefox or something to your site to download every page again plus every fucking image, never cached, over and over and over.

Did I forget to mention directories?

They'll want to link check you and get screen shots as well, don't leave them out or they'll feel fucking neglected.

Wait, there's more, those social sites like Eurekster, Jeteye, etc. that let people link to your shit and then come back banging on your site all the time to make sure that shit's still valid.

Then add up all the RSS feed readers and aggregators that pull down your RSS feeds that nobody ever fucking reads. Not to mention the RSS feed finders like IEAutodiscovery that run amok on your site just looking for RSS feeds ... FUCK!

If you run affiliate programs you have CJ quality bot or some shit hitting your site and if you run ads then the Google quality bot, it's always something.

Don't forget the assholes running the dark underbelly of the web with all the scrapers, spam harvesters, forum, blog and wiki spammers, botnets and other malicious shit pounding on our sites daily.

Add on top of all this shit Firefox, Google Web Accelerator and now AVG's toolbar all pre-fetching pages that will most likely never be read and holy shit, we're being swamped!

OK, now that we've identified all this bot traffic, where's all the fucking people?

Of course you think all those hits from MSIE and Firefox are people, right?

Hell no!

Are you out of your fucking mind?

Those hits are the scrapers, screen shot makers and companies like Cyveillance and Picscout that don't want you to stop them from crawling your site so they just pretend to be humans to get past the bot blockers.

Well guess what?

There are no fucking people on your site. the internet is now run for and used exclusively by bots.

Apparently you missed the memo.


Anonymous said...

From: The Bot Overlords
To: Internets

We love you. Feed us.

httpwebwitch said...

IncrediBill you are my hero, you say what the rest of us are afraid to say because we'd get our mouths rinsed with soap.

Fantastic rant.

Ian Turner said...

Looks like you are in need of a good IP database.

And like you I think that the impact of some is a real pain, especially those picsearch ones trawling for fucking copyright images - yes I've got 6 Gig of images on the site - and yes I have copyright on or permission to use every last one. They don't need downloading by parasite bots every month to check that they haven't changed.

Well ranted - you nearly set me off on one there :)

Ros said...

Love the last line. I'm starting to feel that way too when I look at my logs, it's insane.

I remember when Brett Tabke mentioned WMW having a 50 to 1 bot-to-human ratio at one point, I thought it was awful. I think a lot more websites are at that point now.

angsuman said...

It is amazing how much bandwidth we waste supporting these spiders everyday.

BTW: Your name sounds incredibly familiar (no pub intended), I just can't place it where...

Anonymous said...

Shit. I missed the first dot bomb, but this post certain burst my own little bubble. And Google Analytics got my hopes up.

I hate you now.

However...I am intrigued by your view s and with to /subscribe to your newsletter. One more RSS pounding, coming you way.

peterg22 said...

There are no fucking people on your site. the internet is now run for and used exclusively by bots

Ah, not strictly true .. I get bots and hackers! You're right though - I don't seem to get real people visiting now and can't recall the last time somone left a comment on my blog other than to sell me something. *sigh*

Anonymous said...

so true..

Christopher said...

Sublime, just sublime. Thank you for writing this. :D

Going to have to keep you on my (VERY small) list of feeds now for more little bits of excellence like this!

Alphane Moon said...

Woohoo! My site has a more than 50:1 bot-to-human ratio. I'm essentially writing for the machines.

Do you have an idea why they keep asking for a file named "transformers.txt"? That's not quite right ...

_ck_ said...

You definitely have a point. I was trying to make a list of the most active bots but I quit at 100 of them. A few years ago 100 might cover it.

Then there are the ones that don't identify themselves at all and use a generic useragent. It's one thing if they take just your front page to see what the site is about but another when it's 100+ pages in under a minute.