There are a lot of recent posts from people reaching a near hysteria fever pitch over what appears to be Live.com scouring the 'net looking for black hat sites doing things like cloaking or worse.
What they're all posting about appears to be that MS Live.com is doing some stealth crawling that appears to be sending bogus query strings looking for pages that change their response based on the query, which is what cloaked web sites do, and display advertising related to the topic that brought you to the page.
However, I've seen a few thousand other mysterious page requests from that IP range which most of you probably haven't noticed that I'll share below, which may or may not be related, hard to say at this point.
Sometimes, but not always, the IP address claims to be coming via a proxy such as:
1.1 SEA-PRXY-02Maybe some of this is unrelated, maybe it's totally relevant, who knows except MS and they aren't telling. However, starting as far back as 01/07/2007 my bot blocker started trapping what appeared to be stealth crawl activity in the 131.107.*** range:
1.1 SEA-PRXY-01
"1.1 NET-PRXY-03, 1.1 NET-PRXY-04"
1.1 NET-PRXY-04
1.1 RED-PRXY-30
... and more
01/07/2007 131.107.0.96Then it appears a human responded to a bot challenge:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)"
01/12/2007 131.107.0.95
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)"
01/15/2007 131.107.0.104
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; I
nfoPath.1; .NET CLR 2.0.50727)"
01/15/2007 15:56:38 RESPONSE 131.107.0.104Then this BLANK user agent started hitting on the same day
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; In
foPath.1; .NET CLR 2.0.50727)"
01/15/2007 131.107.0.86 ""Then the sudden challenges and responses on 131.107.0.104 happened again so maybe that really was a human behind at least one of those proxies, who knows.
The blank UA on 131.107.0.86 kept asking for thousands of pages for many weeks, including "/robot.txt" that made me giggle.
In the middle of all this there's this little nugget:
03/29/200 131.107.0.96 "Wget/1.8.1"Then in March there's another rash of challenge's in 131.107.0.* and a single response on 131.107.0.104:
04/28/2007 RESPONSE 131.107.0.104What does it all mean? No clue yet...
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; In
foPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"
Suddenly after months the blank UA's on the 131.107.0.104 megacrawl seem to come to a close.
Then we get this little gem:
05/30/2007 131.107.0.95 "LWP::Simple/5.805"June has a mix of challenges and a couple of responses so humans may use that IP block every now and then.
Then these nuggets pop up:
07/10/2007 131.107.0.95 "Java/1.6.0_01"Blank UA shows up on other IPs:
07/10/2007 131.107.0.96 "Wget/1.8.1"
07/13/2007 131.107.0.86 "" the blank UA starts crawling again.
07/23/2007 131.107.0.101 ""Now one IP with blank UA crawls a few days:
07/23/2007 131.107.0.104 ""
07/23/2007 131.107.0.96 ""
07/24/2007 131.107.0.73 ""
07/26/2007 131.107.0.96 ""
07/27/2007 131.107.0.95 ""
10/16/2007 to 11/05/2007 131.107.0.104 ""Then the PERL crawl begins:
11/15/2007 131.107.0.96 "libwww-perl/5.805"And those last two IPs are still currently crawling as "libwww-perl/5.805" as I write this.
11/16/2007 131.107.0.95 "libwww-perl/5.805"
When you add it all up a couple of things that come to mind are that Microsoft is checking for cloaking, has some pet projects possibly being tested and/or they are checking to see how websites respond to a browser user agent vs. user agents that are normally blocked and it's probably a mix of all the above.
See the response from msndude msg#3442263 on WebmasterWorld:
First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.Please tell me what gives you the right to scan thousands of pages without permission and then threaten to dump our ass if we don't let you run rampant without control over our website?Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.
Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.
That's some pretty big balls even for Microsoft!
Since it's annoying some people for no sane reason I say go block the IP range and go back to sleep because Microsoft doesn't send enough traffic to put up with this abuse in the first place.
Besides, Microsoft has some damned explaining to do before they have any room to bully people as I've got quite the list of documented abuse from that IP range that would justify anyone blocking the bad behavior exhibited on 131.107.0.*.
That's my $0.02.