Wednesday, February 22, 2006

Search Engines Let Scrapers Bypass Spider Traps!

Just when you thought you've seen it all the actual search engines themselves can be used by scrapers to bypass spider traps. How this is accomplished is the scrapers find all of the indexed page names from your site in Google or Yahoo and then download pages the using known page names from your site thus side-stepping spider traps as they aren't actually spidering your site at all.

Therefore, just eliminating your pages from being CACHED in the search engines doesn't stop scrapers from still using the remaining data to their advantage.

Some days it just doesn't pay to get out of bed.

3 comments:

baraqyal said...

You should start writing spider software since you know all the tricks.

You could make millions!

IncrediBILL said...

Writing spider software?

Are you insane?

I want to use my power for good, not evil!

Dan Kramer said...

You could always disallow access to any non search engine bot user agent without a whitelisted HTTP_REFERER header.