Sunday, October 28, 2007

Kavam's SearchMe Charlotte Taking Screen Shots?

SearchMe has been around for a time but it looks like now they are taking screen shots.

For the novice looking at log files, any time you see FireFox for Linux that keeps methodically hitting pages over a long period of time you can almost assume with certainty that someone is making screen shots, especially when the IPs come from a data center.

Not only did I see screen shots being taken on my web pages, but I've seen their screen shot bot pulling images I have embedded on other web sites, so they're aggressively taking screen shots across the web.

Does the fact that they're taking screen shots mean that they're coming out of stealth mode and launching a new search service?

I'm speculating that this may be the case because taking screen shots is a very time consuming process and it wouldn't make sense to take screen shots and then let them all sit around aging and be totally out of date unless you intended to go public with some new search service soon.

Here's the screen shot activity to look for in your web logs:

209.249.86.17 - "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.5) Gecko/20070728 Firefox/2.0.0.5"
That IP belongs to:
Kavam MFN-T595-209-249-86-0-24 (NET-209-249-86-0-1)
209.249.86.0 - 209.249.86.255
Other activity in that IP range:
01/02/2007 "Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.betaspider.com/)"

03/05/2007 "Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/)"
Looks like Kavam is a legit company with funding and all that but making screen shots without changing the user agent to identify that's what they're doing is kind of lame. Very little is known about them other than they built Wikiseek, which has nothing to do with why they are attempting to crawl and screen shot my main web site, so they obviously have something new in the works.

I've decided to block them temporarily until they come out of stealth so I can see what they're up to because I don't need someone crawling a site with over 100K pages unless they give me a damn good reason ;)

5 comments:

Johann said...

As for Snap, they have a little something in their user agent: Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9

I'll take another look at SearchMe's netblock and see what I find.

Anonymous said...

Intersting.

Just for my curiosity, how do you detect they are making a screen shot?

IncrediBILL said...

One huge clue is they dump all cached images for each page load. A normal browser would retain the images in cache from the previous page, but Firefox being used for snap shots has it's cache wiped before each new snap shot is made.

Johann said...

Bill, the same can happen with regular browsers as well. I often see people with caching turned off completely.

Don't know why, maybe they think they'll see more up-to-date information?

IncrediBILL said...

Yes, but the same IP address slowly and methodically accessing pages 24x7 without sleep sure isn't a regular browser with the cache disabled.