Thursday, November 08, 2007

Another Stealth Crawler via Extended Host

Here we go with another stealth crawler operating from Extended Host:

194.110.162.19 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.225 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.227 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.228 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.231 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.84 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.85 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.86 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.87 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.88 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.89 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.92 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.93 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.94 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.96 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Here's the Extended Host IP range:
inetnum: 194.110.160.0 - 194.110.163.255
netname: EXTHOST-NET
descr: Extended Host
They just keep coming and I just keep closing more holes they slither through.

9 comments:

Anonymous said...

Thanks for the tip. Only had one request so far from that network but better safe than sorry.

Anonymous said...

Added, Thanks again.

After installing the "Great Firewall Of China", re-erecting the "Iron Curtain" and banning proxies the site has taken off.

Results So Far:
- Non-human bot bandwidth down 66%
- Unique visitors up 15 to 20% per month
- Ecpm doubled
- Top 10 for a two word search that returns 801,000,000 results

Anonymous said...

Good reading, ‘botnets’

IncrediBILL said...

Glad to hear it's working for you Ban Proxies as I had a similar experience when I shut out the assholes scrapers.

Anonymous said...

So far I've seen only positive results but, like you said ".. I just keep closing more holes they slither through."

I don't know why it took me so long to realize what was happening when the site was having problems. Then one day I was reading a thread about the impact of content theft (scrapers) and proxies that you were contributing to. A week later I had two scripts modified to produce logs files for the reasons why an IP was being denied access to the site. Bill, I almost fell off my chair when I saw those logs.

I don't ban single IPs. I trace IPs back to hosting data centers and ban the complete DC. Being brutally proactive has tripled my income and I've haven't had the need to participate in a "penalty thread" since.


Recent garbage caught by the scripts today;
216.120.237.150 default/doctype.php
85.234.212.58 admin/admin_board.php
91.137.2.182 business_inc/list.php
66.249.27.10 business_inc/saveserver.php
87.106.178.19 business_inc/saveserver.php


Those files have never existed on my server. Everyone of those files have a "Remote File Inclusion Vulnerability" I will let anyone reading this decide what they want to do.

Once again, Thanks Bill

IncrediBILL said...

If you want to completely eliminate the problem you'll eventually need to block individual IPs because there are a ton of home-based scrapers coming from DSL and Cable services and if you block the entire service you'll lose traffic! ;)

Anonymous said...

For now I rely on the scripts (one of these scripts you contributed to :) to block single IPs. I go through the log files looking for repeat offenders and trace them back to data centers if possible. Two hours every Sunday is devoted to tracking down IPs for banning and whenever you post a list of known problems.

New Topic

I'm sure you've heard of Attributor and what they are doing. Reuters is using Attributor

<-- Time to rib Bill -->
A Crawlwall service coupled with Bill's very own version of attributor would be a very viable product.

IncrediBILL said...

I go one better than Attributor as I don't have to crawl the web to find my content.

Google indexes the content and I can search on a single unique ID embedded in the content and up pops hundreds of copies all over the place.

Besides, when everyone is blocking bots thanks to scrapers and apammers, Attributor won't be able to crawl so I wouldn't invest heavily in their technology.

IncrediBILL said...

BTW, I wrote about attributor a year ago.

Maybe their using Yahoo as their data source and not crawling at all, hard to say.