Tuesday, November 28, 2006

BDFetch Plays By The Rules

Normally I'm always slamming corporate bots but when one company, like brandimensions appears to be playing by all the rules, I feel they should get a little praise.

Here's what their access attempts look like:

209.167.50.22 "GET /robots.txt HTTP/1.1" "www.brandimensions.com" "BDFetch"
209.167.50.22 GET /somepage.html HTTP/1.1" "www.brandimensions.com" "BDFetch"
209.167.50.22 "GET /robots.txt HTTP/1.1" "www.brandimensions.com" "BDFetch"
209.167.50.22 GET /somepage.html HTTP/1.1" "www.brandimensions.com" "BDFetch"
209.167.50.22 "GET /robots.txt HTTP/1.1" "www.brandimensions.com" "BDFetch"
209.167.50.22 GET /somepage.html HTTP/1.1" "www.brandimensions.com" "BDFetch"
At least they asked for robots.txt and appear to only go in when allowed.

However, they had a couple of bumps that I'd like to see them fix.

1. Ask for robots.txt once or twice a day, maybe once an hour worse case, not every access.

2. Set your reverse DNS to say bdfetch.brandimensions.com or something similar so we can verify it's really your company and not someone spoofing you.

3. Include a link to a page about your crawler in the user agent, and a version number, such as ""BDFetch/1.0 +http://www.brandimensions.com/crawler.html"

Other than those minor glitches, kudos for at least trying to play by the rules and at least giving webmasters the choice to allow you to crawl or not.

Nicely done.

3 comments:

Anonymous said...

Thanks for your kudos. I'll forward your blog entry to my colleagues at Brandimensions -- it'll make their day.

One of our company's core values is to govern ourselves with integrity and professionalism. Our crawler playing by the rules and being a good Internet "citizen" is a result of this tenet.

Your suggestions make sense. Stay tuned to your web logs over the next few weeks...

Hugh Hyndman
CTO
www.brandimensions.com

Anonymous said...

Kudos are well and deserved, however, their crawler reports "www.brandimensions.com" as the refering url. Other robots (Yahoo, Google, etc..) simpley have a "-" for the refering url. That would be _much_ preferred. Unfortunately, I've set to exclude their robot because of this problem.

Unknown said...

I'm seeing two types of traffic coming from the same IP address as BDFetch - one with user agent 'BDFetch' doing a few page views a day, the other with user agent 'Mozilla/4.0' etc doing 10 000 or so views a day.