Tuesday, March 28, 2006

Almost Ten Percent of All Pages Blocked

The total number of pages being blocked from non-humans masking as browsers on my site is about 10% of all pages displaying daily and this rate has been holding strong since I started blocking them.

That's right, for every 48K pages displayed daily between 4K-5K are going to scrapers, crap search engines that send no visitors, and other useless wastes of bandwidth. That's a heck of a lot of pages that are being scraped and the purposes for all of this are still only slowly unfolding as each week something new shows up somewhere on the net.

Simply amazing.

It will be interesting to get more sites profiled moving forward and see if this is a common trend or not as 10% of all site traffic being wasted, not to mention server resources, could take a decent load off many servers starting to feel the pinch.

Will get some more detailed stats together next week hopefully and put them up for everyone to take a gander at as it's blowing my mind for sure.

4 comments:

Anonymous said...

Only 10%? I thought this stuff was a real problem. Since most of us have no problem dealing with + or - 10% of traffic, I'm not sure if that's worth the energy expenditure to block them for that little.

-wheel

IncrediBILL said...

You miss the point.

If it was all just legit search engines that used robots.txt we wouldn't even be having this discussion.

It's what they do with the 10% of the pages competing with you, or building a new business based on your content without permission, or the fact that they may ask for 1K pages in a few seconds and knock the server offline for minutes at a time that makes it all worthwhile.

Since blocking these idiots my SERPs sre better than ever after regaining control over competing keywords using my content and copyright issues from automated copying tools is barely a concern anymore.

I know I sure wouldn't run the site bare anymore, that's crazy.

P.S. Don't think it's 10% everyday, some days it's a LOT more and last month saved almost 20GB in crawling.

Love London said...

been reading you with interest for a while recently. have been running a quiz on my site for a while, multipages, going from page to page (select answer by radio button).
some users are getting zero - very unlikely as the questions are easy, could this also be used as a kind of honeypot bot trap - they are clearly 'pressing' the form button but not answering the questions.

IncrediBILL said...

Jez,

Probably just someone clicking as I've not run onto any bots yet that follow form posts but that doesn't mean they aren't doing it.

Typically bots just do GET commands as I've even seen GETs for form pages which is why I reject all form posts that aren't using POST to avoid spammers.