People are always asking me how to build an OPT-IN .htaccess file, which I advocate, opposed to the traditional blacklist methods.
The problem with OPT-IN is it's VERY unforgiving and you really need to check your visitor stats and make sure you're letting in all the crawlers that are sending you traffic.
Belows is a bare bones sample of how it works and anything not in the list gets a 403 Forbidden error so you'll probably need to add more items and refine this for your particular website.
Sample .htaccess file for Apache 2.0:
#allow just search engines we like, we're OPT-IN onlyJust save the above as a file named ".htaccess" in your httpdocs or root web folder in your hosting account and all the crazy bots abusing your site will get bounced from now on.
#a catch-all for Google
BrowserMatchNoCase Googlebot good_pass
BrowserMatchNoCase Mediapartners-Google good_pass
#a couple for Yahoo
BrowserMatchNoCase Slurp good_pass
BrowserMatchNoCase Yahoo-MMCrawler good_pass
#looks like all MSN starts with MSN or Sand
BrowserMatchNoCase ^msnbot good_pass
BrowserMatchNoCase SandCrawler good_pass
#don't forget ASK/Teoma
BrowserMatchNoCase Teoma good_pass
BrowserMatchNoCase Jeeves good_pass
#allow Firefox, MSIE, Opera etc., will punt Lynx, cell phones and PDAs, don't care
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass
#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well
<Limit GET POST PUT HEAD>
deny from all
allow from env=good_pass
Remember, anything not listed will no longer have access so be careful and make sure everything your site needs allowed is in the list.