Tuesday, December 06, 2005

GoogleBot Ignores Robots.txt!

Here's a big shocker that 2 days after I build my new spider trap Google ignores my robots.txt entry and snares itself in the trap.

Simple robots.txt, nothing hard about this test:

User-agent: *
Disallow: /rogue_bots.html
Sure enough my spider trap log for that page shows:
SPIDER AGENT=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Bad, bad Google, tsk tsk.

