Tuesday, January 01, 2008

DomainTools Whois and AboutUs Site Accesses Revealed

The DomainTools Whois is now collecting and displaying more information than ever about our web sites. Their Whois display used to be limited mostly to public registration information such as Whois, the IP address, where you host and the basics. Then DomainTools expanded Whois a while back and started taking data straight from our domains without permission and doesn't even look at robots.txt to see if we want to participate. The screen shots were no big deal but then they added some SEO text browser that allows people to snoop on your site and who knows what's next.

If that wasn't enough, then along came their Wiki companion site AboutUs.org, which scraped off some data as well. AboutUs does seem to use robots.txt, see backwards robots access below, but by the time you find out about the bot it's too late because you already have scraped content on your domain's AboutUs Wiki page.

Enough is enough, it's official, I'm annoyed.

Since I could find no way to "opt-out" of all the new toys on DomainTools Whois I decided it was time to opt-out the old fashioned way and just block 'em.

If they had just identified themselves in the User Agent this would've been easy because those are all monitored on my main site automatically. However, it appears that DomainTools either doesn't know how to put their information in the User Agent field for the tools they use, or they really don't want to get snared and stopped easily, because they use standard Firefox and MSIE user agents for accessing your site.

However, note that the referrer does claim that it's coming from DomainTools so you can at least use that as an indication it's them although the User Agent field would've been preferred since it is the standard for this sort of thing.

Here's a sample of DomainTools SEO Text Browser hitting your server:

66.249.16.212 "GET /" "http://whois.domaintools.com/somedomainname.com" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
The SEO Text Browser thing looks like it might be telling the webmaster who's snooping on their site because I caught it claiming to be a proxy that was forwarding information for my IP address when I was looking at the site so using it is far from anonymous!

66.249.16.211 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11" "/" Proxy Detected -> VIA=1.1 www.domaintools.com FORWARD=aaa.bbb.ccc.ddd

Of course your average webmaster would never see this proxy information because it's not in your default log file, but I log proxy details and a whole lot more.

This is the DomainTools screen shot thumbnail generator hitting your site:

64.246.165.237 "GET /" "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"

Here's their companion site AboutUs.org which claims it uses robots.txt but didn't bother to check if I allowed them on my site until AFTER they had already been to the site as the access was in exactly the order shown below.

66.249.16.207 "GET /" "-" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"

66.249.16.207 "GET /robots.txt" "http://www.somedomainname.com/" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"

You might want to block AboutUsBot unless you want them to freely license whatever shit they scrape off your site with the claims on the bottom of their site:
All content is available under the terms of the GFDL and/or the CC By-SA License
If you want to keep them from snooping your site the IPs I'm currently blocking are:
66.249.16.*
66.249.17.*
64.246.165.* (screen shots)
So there's all I know at this time, you have robots.txt and htaccess files, you know what to do.

UPDATE:

They are also running screenshots from 216.145.16.*
Wonder how many other blocks of IPs they're using?

UPDATE UPDATE:

I was accused of confusing DomainTools and AboutUs.org!

I was never confused but whoever posted that I was confused apparently is just because I lumped them together because they operate from the same IP space, their whois records have the same address, and they have some shared data in common such as the thumbnails.

AboutUs uses the thumbnails from DomainTools and DomainTools Whois has a link from every domain to "AboutUs: Wiki article on ..." what would you call them?

I never said they were the same company, totally not confused, but whatever makes you happy.

UPDATE UPDATE UPDATE:

I knew the connections would be spelled out somewhere on the 'net when I had a little more time to do some snooping on the site.
One of the questions posed was about our connection with Name Intellignece. Jay Westerdal, CEO of NameIntelligence.com, in fact, recently stepped down as AboutUs CTO...
Confuse THAT!

12 comments:

Anonymous said...

Thanks for the heads up.
You can block another CIDR that they are using:
216.145.16.0/24

IncrediBILL said...

Cool!

Thanks!

Anonymous said...

You do know that Name Intelligence is behind this crap?

Name Intelligence stealth crawling as Yahoo! Slurp.

They have recently stealth crawled as GoogleBot as well. Yawn.

Anonymous said...

If it walks like one, scrapes like one, smells like one, I don't care which *.com it is. Ban it :).

Thanks Bill

Anonymous said...

Some people eat lunch while other are "out to lunch" and some are lunched.

aboutus.org.

Lunch time, let's see who wants to sit at the table.

Anonymous said...

hello - I have also seen more and more of those sites, I also hate them, scraping your personal or company info and then ad some adsense.

maybe we should have a ip list of bad ips

zeus

Anonymous said...

http://www.google.com/support/forum/p/Webmasters/thread?tid=6e030b062545c66d&hl=en

GermanGuy said...

Thanks for your helpful post on DomainTools.

On http://www.google.com/support/forum/p/Webmasters/thread?tid=6e030b062545c66d&hl=en Phil wrote a log which says that 64.246.161.30 is also a IP of DomainTools. This IP (and other IPs like 64.246.178.34, 216.145.11.94, 216.145.17.190) are also listed in this log: http://gewerbevereinriegel.de/ini.php

DomainTools IP Explorer does say something else:
http://www.domaintools.com/reverse-ip/explorer.html?ip=64.246.161.30
whereas the IP addresses you posted obviously belong to this **** company according to the IP Explorer.

If you try all the IPs I have posted above you'll see that they belong to Compass Communications Inc. In a search for this company I found this forum: http://forum.statcounter.com/vb/showthread.php?t=18495
And - what a surprise - this leads to whois.sc = domaintools.

Maybe you analyze these things in a more detailed way. I would be glad to read about it in your blog.

Anonymous said...

I emailed them to ask why my personal information was exposed on the net and to please take it down. They wrote me back a HUGE explaination (see below) stating it was my fault, not theirs and they are a third party who was given the information and they don't plan on taking it down but if I wanted to block it.. I can pay for a "history block" at 10$ per domain per day. Total BS

"When you registered your domain without privatizing your domain a Whois record was created that can be found on many whois sites and on Google.

http://www.domaintools.com/learn/help/whois/

All the information we display on our website is legally-mandated, public information and easily obtainable by anyone from the web. DomainTools is only a third party displaying this information. Historical/cached information is never deleted from the world wide web.

WHOIS services provide public access to data on registered domain names, which currently includes contact information for Registered Name Holders. The extent of registration data collected at the time of registration of a domain name, and the ways such data can be accessed, are specified in agreements established by ICANN for domain names registered in generic top-level domains (gTLDs). For example, ICANN requires accredited registrars to collect and provide free public access to the name of the registered domain name and its nameservers and registrar, the date the domain was created and when its registration expires, and the contact information for the Registered Name Holder, the technical contact, and the administrative contact.

It is unfortunate that your Registrar consultant did not explain to you in detail the privatization options that are available to everyone, when they first register a domain on the world wide web. This would have prevented any record that would potentially contain personal information, from ever having been created. We understand that the records we display may contain personal data, however WE are not responsible for that record having been created.

Going forward you can contact the ICANN registrar listed on your record for information regarding privatization of your WHOIS record. The ICANN registrar may be able to assist you in changing or swapping unlisted numbers or other private contact information. Once you have updated the information with your registrar, the information we show will also update accordingly. In most cases it takes 24-48 hours for information on whois services to reflect updates.

It is not our policy to remove whois records from our database. We do, though, understand your concern. Whois privacy services should be available from your own registrar. Our History Block product is also an excellent tool to obscure the history of your domain name for a limited time (e.g., during a specific business transaction).
You can purchase your History Block for your domain at http://www.domaintools.com/learn/help/whois-history/"

Anonymous said...

My doesn't someone set up a Class Action Lawsuit against DomainTools.com

Anonymous said...

How to you block IP's with bloggger? I want to follow your tips on this post and would like to know how to do it on my blog.

Thanks

VSagar said...

I check many times on whois.sc about my site vsagar.com.
It shows everything ok, but it does not show screenshot of my website.

What should i do to display screenshot of my website, there?