Thursday, January 26, 2006

WebAbuse 2.0 Overnight Stats

Just to give some of you a hint that I'm not exaggerating about the amount of site crawling going on here's a list of 122 different IPs and agents automatically blocked last night that collectively attempted to access many thousands of pages. Some seem to have specific targets and only go after a few pages every time they return but others want to deep crawl the crap out of my site.

When you add it all up the bots are sometimes accessing more pages than the actual site visitors because this list doesn't include authorized bots.

FYI, don't be fooled by what you see just because the user agent looks legit means nothing as a human can't click on and read 200 pages in 150 seconds. It's possible there is an innocent or two that was snared, but considering what's at stake I don't really care anymore.

This is the future, run for cover:

12.221.77.114 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
128.2.220.167 PrivacyFinder/1.1
130.158.81.39 Wget/1.10.1
131.107.0.84 SandCrawler - Compatibility Testing
134.96.1.195 AnswerBus (http://www.answerbus.com/)
137.43.154.203 NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html; nutch-agent@lists.sourceforge.net)
139.18.2.43 findlinks/1.1-a8 (+http://wortschatz.uni-leipzig.de/findlinks/)
142.167.88.250 internal zero-knowledge agent
144.131.251.29 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
151.24.66.200 Internet Explorer 5.5
162.40.193.253
172.169.142.20 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
172.203.82.76 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
193.165.250.22
193.42.229.3 NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
193.47.80.43 Exabot/2.0
194.167.196.3 Wget/1.10.2 (Red Hat modified)
194.67.3.21 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050920 Firefox/1.0.7
195.101.0.67
195.159.130.14 ZoomSpider - wrensoft.com
195.27.247.70 ColdFusion
195.37.209.45
195.39.234.162
195.70.35.179 KummHttp/1.1 (compatible; KummClient; Linux rulez)
196.209.78.70 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
201.230.91.192 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
201.26.110.67
202.165.102.186 SpiderMan
203.10.224.58
203.113.238.60
203.113.238.60 Random
205.209.169.222 MJ12bot/v1.0.7 (http://majestic12.co.uk/bot.php?+)
206.188.0.11 Jakarta Commons-HttpClient/3.0-rc2
207.148.212.242 PHP/4.1.2
207.171.172.6 Java/1.5.0_04
207.58.161.116
208.185.247.74 PageBitesHyperBot/600 (http://www.pagebites.com/)
209.131.61.1 NutchCVS/0.7 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
209.167.50.22 LinkWalker
209.178.137.175
209.18.119.138 Jakarta Commons-HttpClient/3.0-rc2
209.190.20.194 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
209.237.238.225 ia_archiver
210.17.148.245 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
210.173.180.156 ichiro/2.0 (http://help.goo.ne.jp/door/crawler.html)
211.5.60.108 RSS_READER (mctwist@mail.dr-k.info)
212.117.84.230 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
212.117.84.230 Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8) Gecko/20051111 Firefox/1.5
212.80.76.5 SeznamBot/1.1 (+http://fulltext.seznam.cz/)
213.133.123.154 libwww-perl/5.65
213.156.54.186 Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)
213.176.109.234 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
213.203.184.30 InetURL/1.0
213.42.2.11
216.195.47.98 Snoopy v1.2
216.22.48.28
216.247.238.226 VSE/1.0 (vivisimolog@web121.com)
217.212.224.142 psbot/0.1 (+http://www.picsearch.com/bot.html)
220.210.177.118 RSS_READER (mctwist@mail.dr-k.info)
221.116.237.114 NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
24.11.67.32 Java/1.5.0_06
24.177.134.6 aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)
24.19.240.172 Python-urllib/2.1
24.202.166.142 WebPix 1.0 (www.netwu.com)
24.216.179.135 Zeus 34366 Webster Pro V2.9 Win32
24.22.159.131 FyberSpider
24.242.26.149 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
24.5.187.223 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
24.57.8.78 EasyDL/3.04 http://keywen.com/Encyclopedia/Bot
38.113.234.181 voyager/1.0
58.64.126.5
61.135.131.173 sohu agent
62.163.40.65 Java/1.4.1_04
63.229.208.79 NextGenSearchBot 1 (for information visit http://about.zoominfo.com/PublicSite/NextGenSearchBot.asp)
64.127.124.159 OmniExplorer_Bot/5.85a (+http://www.omni-explorer.com) WorldIndexer
64.141.15.119 Wavefire/0.8-dev (Wavefire; http://www.wavefire.com; info@wavefire.com)
64.148.232.129 brfcaofenxv cdvP3k3xuesucrcxgPp3m
64.148.232.129 fWjnyc p ctcwmbbulcdeqw qew
64.164.63.175 Java/1.5.0_06
64.239.7.218 POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)
64.241.242.18 NutchCVS/0.05 (Nutch; http://www.nutch.org/docs/en/bot.html; nutch-agent@lists.sourceforge.net)
64.38.240.97 Roffle/l.ol(compatible; MSIE 6.0; Windows NT 5.0;
64.40.115.34 Python-urllib/1.16
64.5.245.27 genieBot (http://64.5.245.11/faq/faq.html)
64.94.163.151 Jakarta Commons-HttpClient/3.0
65.19.150.208 OmniExplorer_Bot/5.88 (+http://www.omni-explorer.com) WorldIndexer
65.24.45.49 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
66.117.176.20 Java/1.4.2_04
66.147.154.3 http://www.almaden.ibm.com/cs/crawler [fc14]
66.234.139.194 snap.com beta crawler v0
66.40.35.42 WWW-Mechanize/1.12
67.108.223.130 NextGenSearchBot 1 (for information visit http://about.zoominfo.com/PublicSite/NextGenSearchBot.asp)
68.127.10.143 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
69.0.235.24 Topular/1.0
69.238.36.166
69.41.14.5
70.124.116.68 FavOrg
70.34.224.188 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
70.49.144.182 Visual_Odyssey_Spider/3.0 (http://www.visualodyssey.com)
70.85.193.178 Poirot
71.102.140.247 envolk[ITS]spider/1.6 (+http://www.envolk.com/envolkspider.html)
71.137.197.195 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7
71.213.9.100 Lynx/2.8.5dev.7 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.6b
80.219.233.222 EmailSiphon
80.255.64.42 SIE-CX70/54 UP.Browser/7.0.2.2.d.3(GUI) MMP/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
80.77.86.240 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
81.1.87.163 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts)
81.155.34.158 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
81.19.66.38 StackRambler/2.0 (MSIE incompatible)
81.73.137.226 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
81.83.46.233 Googlebot/2.1(+http://www.googlebot.com/bot.html) (Googlebot/2.1(+http://www.googlebot.com/bot.html); MSIE; Windows; SV1)
82.120.57.235 Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8) Gecko/20051111 Firefox/1.5
82.131.195.52 LapozzBot/1.4 (+http://robot.lapozz.com)
83.44.42.199 Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
84.148.107.62 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 larbin@unspecified.mail
84.148.108.134 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 larbin@unspecified.mail
84.81.17.28 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
85.101.47.187 Microsoft URL Control - 6.00.8169
85.108.164.241 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
85.125.153.160 larbin_2.6.3 larbin2.6.3@unspecified.mail
87.193.34.166 xyz

BTW, this was a slow night!

4 comments:

baraqyal said...

You block googlebot?

IncrediBILL said...

Read that I said don't be taken by fake user agent strings.

81.83.46.233 Googlebot/2.1(+http://www.googlebot.com/bot.html)
Official Name: d51532EE9.access.telenet.be
IP address: 81.83.46.233

I seriously doubt that's Google crawling me from Belgium and they are allowed on my site based on a whitelist of IP address ranges, not by user agent string.

Anonymous said...

hmm there are at least 3 on your list that search engines use for checking purposes

IncrediBILL said...

If they're sneaking around unannounced, c'est la vie, as I'm battling to save my site.

I let the legit known addresses thru so the others used for checking purposes must've done something bad to get snared - it's all automated.

If you don't mind sharing which addresses I'll look into it.