The DomainTools Whois is now collecting and displaying more information than ever about our web sites. Their Whois display used to be limited mostly to public registration information such as Whois, the IP address, where you host and the basics. Then DomainTools expanded Whois a while back and started taking data straight from our domains without permission and doesn't even look at robots.txt to see if we want to participate. The screen shots were no big deal but then they added some SEO text browser that allows people to snoop on your site and who knows what's next.
If that wasn't enough, then along came their Wiki companion site AboutUs.org, which scraped off some data as well. AboutUs does seem to use robots.txt, see backwards robots access below, but by the time you find out about the bot it's too late because you already have scraped content on your domain's AboutUs Wiki page.
Enough is enough, it's official, I'm annoyed.
Since I could find no way to "opt-out" of all the new toys on DomainTools Whois I decided it was time to opt-out the old fashioned way and just block 'em.
If they had just identified themselves in the User Agent this would've been easy because those are all monitored on my main site automatically. However, it appears that DomainTools either doesn't know how to put their information in the User Agent field for the tools they use, or they really don't want to get snared and stopped easily, because they use standard Firefox and MSIE user agents for accessing your site.
However, note that the referrer does claim that it's coming from DomainTools so you can at least use that as an indication it's them although the User Agent field would've been preferred since it is the standard for this sort of thing.
Here's a sample of DomainTools SEO Text Browser hitting your server:
66.249.16.212 "GET /" "http://whois.domaintools.com/somedomainname.com" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
The SEO Text Browser thing looks like it might be telling the webmaster who's snooping on their site because I caught it claiming to be a proxy that was forwarding information for my IP address when I was looking at the site so using it is far from anonymous!
66.249.16.211 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11" "/" Proxy Detected -> VIA=1.1 www.domaintools.com FORWARD=aaa.bbb.ccc.ddd
Of course your average webmaster would never see this proxy information because it's not in your default log file, but I log proxy details and a whole lot more.
This is the DomainTools screen shot thumbnail generator hitting your site:
64.246.165.237 "GET /" "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"
Here's their companion site AboutUs.org which claims it uses robots.txt but didn't bother to check if I allowed them on my site until AFTER they had already been to the site as the access was in exactly the order shown below.
66.249.16.207 "GET /" "-" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"66.249.16.207 "GET /robots.txt" "http://www.somedomainname.com/" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"
You might want to block AboutUsBot unless you want them to freely license whatever shit they scrape off your site with the claims on the bottom of their site:
All content is available under the terms of the GFDL and/or the CC By-SA License
If you want to keep them from snooping your site the IPs I'm currently blocking are:
66.249.16.*
66.249.17.*
64.246.165.* (screen shots)
So there's all I know at this time, you have robots.txt and htaccess files, you know what to do.
UPDATE:They are also running screenshots from 216.145.16.*
Wonder how many other blocks of IPs they're using?
UPDATE UPDATE:I was
accused of confusing DomainTools and AboutUs.org!
I was never confused but whoever posted that I was confused apparently is just because I lumped them together because they operate from the same IP space, their whois records have the same address, and they have some shared data in common such as the thumbnails.
AboutUs
uses the thumbnails from DomainTools and DomainTools Whois has a link from every domain to "AboutUs: Wiki article on ..." what would you call them?
I never said they were the same company, totally not confused, but whatever makes you happy.
UPDATE UPDATE UPDATE:I knew the
connections would be spelled out somewhere on the 'net when I had a little more time to do some snooping on the site.
One of the questions posed was about our connection with Name Intellignece. Jay Westerdal, CEO of NameIntelligence.com, in fact, recently stepped down as AboutUs CTO...
Confuse THAT!