Most webmasters spend a lot of time and effort working on marketing their website, or pay someone a lot of money to do this, yet don't do a few common sense things that keep lazy and nosy assed SEO's or other competitors from quickly analyzing all your hard work and simply stealing what you've done.
Not that you can completely stop them because much of the competitive information about who links to you is already public, collected by search engines and toolbars, but you can sure as hell make it a little more difficult to get the rest of the data they want.
Since the SEO Chicks published a list of competitive research tools to help those nosy SEO's snoop, I just thought it would be fair and useful to have a nice list of ways to stop as many of those those snooper tools as possible.
Block Archive.org - No need to let anyone see how your site evolved, snoop or even scrape through archive pages without your knowledge so block their crawler.
User-agent: ia_archiver
Disallow: /
Rumor has it that the ia_archiver may crawl your site anyway so adding it to your .htaccess file is a good precaution as well.
RewriteCond %{HTTP_USER_AGENT} ^ia_archive
RewriteRule ^.* - [F,L]
Block Search Engine Cache - Some people cloak pages and just show the search engines raw text yet show the visitors a complete page layout. Who cares, that's your business and a competitive edge you don't need to share, plus pages can be scraped from search engine cache as well, so disable cache on all pages.
Insert the following meta tag in the top of all your web pages:
<meta content='NOARCHIVE' name='ROBOTS'>
Block Xenu Link Sleuth - Why do you need people sleuthing your site? Screw 'em...
Add Xenu to your .htaccess file as well:
RewriteCond %{HTTP_USER_AGENT} ^ia_archive [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu
RewriteRule ^.* - [F,L]
Make Your Domain Registration Private - Why give the SEO's or any other competitor any clues to help them whatsoever?
Sign up with DomainsByProxy and this will make the nosy little bastards happy:
WHATEVERMYDOMAINNAME.COM
Domains by Proxy, Inc.
DomainsByProxy.com
15111 N. Hayden Rd., Ste 160, PMB 353
Scottsdale, Arizona 85260
United States
Restrict Access To Unauthorized Tools - Use
.htaccess to white list access to your site and just allow the major search engines and the most popular browsers which will block many other SEO tools. If you don't understand the white list method and it scares you, there's a few good
black lists around too.
This is a limited
sample for informational purposes only just to give an idea how it works, see the thread linked above for more in depth samples by WebSavvy, just be cautious in implementing a white list as it's very restrictive:
#allow just search engines we like, we're OPT-IN only
#a catch-all for Google
BrowserMatchNoCase Google good_pass
#a couple for Yahoo
BrowserMatchNoCase Slurp good_pass
BrowserMatchNoCase Yahoo-MMCrawler good_pass
#looks like all MSN starts with MSN or Sand
BrowserMatchNoCase ^msnbot good_pass
BrowserMatchNoCase SandCrawler good_pass
#don't forget ASK/Teoma
BrowserMatchNoCase Teoma good_pass
BrowserMatchNoCase Jeeves good_pass
#allow Firefox, MSIE, Opera etc., will punt Lynx, cell phones and PDAs, don't care
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass
#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well
order deny,allow
deny from all
allow from env=good_pass
Disclaimer: I don't use .htaccess for much so please don't ask for a complete file, this is just a sample as I use a more complex real-time PHP script to control access to my site.
Block Bots and Speeding Crawlers - You can use something like the nifty PHP bot speed trap
Alex Kemp has written or Robert Planks
AntiCrawl. Just another layer of security piled on against snoops and scrapers that pretend to be MSIE or Firefox to avoid the white list or black list blocking in .htaccess.
Block Snoops From Robots.txt - Don't allow anyone other that your white listed bots to see your robots.txt file because it has other stuff in it that SEO snoops might find interesting, and it can become a security risk. Use a
dynamic robots.txt file like this perl script on WebmasterWorld and just add the rest of your allowed bots to the code next to Slurp, Googlebot, etc.
Block DomainTools - since SEO's use it to snoop,
no reason to let DomainTools have access so just block 'em.
Probably lot's of other things you should be blocking as well but this will give you a good start.
This list doesn't completely stop snoops from manually looking at your site, but it certainly stops all of those automated tools from ripping through all your pages, search engine or archive cache, and presenting a nice pretty report.
Heck, why should you help people take away your own money?
Start slowing them down today and stop the next up and comer from getting the info too easy.
UPDATE:One more creative thing you can do to your website is cloak the meta tags so that only the search engines see them and disable the meta tags for normal visitors. Nothing really wrong with this because meta tags by definition are only for the search engines and snooping SEO's will be completely left in the dark when they can't see your meta keywords or description.
Especially if you combine cloaking meta tags with the NOARCHIVE option described above so then it's completely hidden from prying eyes.