Friday, January 13, 2006

Forbidden or Trashed Response

From what comments are posted on various forums the most common thinking is telling bad bots they've been forbidden from the server with a 403 error to make them go away.

However, if the scraper already has your content from previous scrapes the best method may be giving the bots placebo data so the idiots will just let it run until they've trashed all previous copies of your content.

Gonna give that a whirl for a few weeks and see what happens, should have some angry scrapers soon ;)

4 comments:

Aaron Pratt said...

tell us more tell us more on the technical aspects of making this happen! (I am seriously interested).

strange thing, i just visited Jim Boykins blog and got a 403, am I am bad bot?

IncrediBILL said...

You, my friend, are a very bad bot.

EvilBot v2.0 said...

Thankyou very much! :-)

thebear said...

incrediBILL,

Sometimes if you can figure out the site that the content is going to be be featured on you can feed it a copy of that sites homepage. Something about duplicate content come to mind.

If the "bot" is actually an indexed open proxy and you do the above Google will get plenty of duplicate content to filter ;-).

Or just a plain "blank" page full of hidden text might work, maybe sprinkle in some hidden links to p0rn sites.

Oh what fun.