Saturday, January 05, 2008

Why The Hell Is Bloglines Crawling?

Let's start this investigation by noting that Bloglines themselves claim to be a crawler now when you use reverse DNS on their IP address:

65.214.44.29 -> crawler.bloglines.com
This is what Bloglines is supposed to do, read your RSS feed:
65.214.44.29 "GET /rss_feed.xml" "-" "Bloglines/3.1 (http://www.bloglines.com;XXX subscribers)"
However, they've stepped off the RSS path and started coloring outside the lines!

The first off thing I noticed was it asked for robots.txt without any user agent defined:
65.214.44.29 "GET /robots.txt" "-" "-"
So I dug a little deeper and it appears they are running Firefox Minefield which was asking for a bunch of images from 3rd party websites where my graphic appears:
65.214.44.29 "GET /myimage.gif" "http://someotherwebsite.com/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20070308 Minefield/3.0a1"
Finally, I found them requesting some web pages that are NOT in any RSS feed, what the fuck?
65.214.44.29 "GET /anyoldpage.html" "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20070308 Minefield/3.0a1"
So, anyone have a clue what they're doing?

SCREENSHOTS!

Yes, they're making screen shots that appear on ASK.com!

I looked up a few pages from one of my sites in ASK and sure enough, instead of screen shots of the actual web pages there were screen shots of error messages with the Bloglines IP address of 65.214.44.29 in big bold numbers.

The reason I figured that out so easily was I recently decided to just block everything claiming to be coming from Linux just to see what came up and that's why they got an error page instead of a screen shot. Sure, I'm probably blocking a few innocent Linux users as well but they account for an insignificant part of my traffic and overlap with the same tools that servers use so sacrifices were made.

Anyway, what we've learned is that Ask is using Bloglines' IP to make screenshots and look at your robots.txt file yet they don't disclose what they're even looking for in your robots.txt file.

Wasn't that fun?

Friday, January 04, 2008

Does Covenant Eyes Divulge Their Members?

While monitoring activity from Covenant Eyes on one of my servers it became obvious that many of the pages being accessed were fairly unique, not as popular, and easily allowed me to figure out the actual customer Covenant Eyes was watching.

To test my theory I checked the log file for one unique page Covenant Eyes requested and sure enough only a single IP had accessed that file during the course of the day.

Then I got a list of all files that this visitor's IP had viewed and compared it to all the files that Covenant Eyes requested and it was an exact match in the exact same order of access, without any obfuscation, so it was a 100% match without a doubt.

I've been monitoring this situation for several days now and it's always the same.

The visitor comes and views some pages and about 90-120 minutes later Covenant Eyes comes and asks for the exact same pages in the exact same order.

Here's a sample of a visitor's access:

127.0.0.1 "justapage.html"
127.0.0.1 "anyoldpage.html"
127.0.0.1 "justanotherpage.html"
127.0.0.1 "veryspecialpage.html"
127.0.0.1 "anotherrandompage.html"
A while later Convenant Eye's asks for the same pages in the same order:
69.41.14.x "justapage.html"
69.41.14.x "anyoldpage.html"
69.41.14.x "justanotherpage.html"
69.41.14.x"veryspecialpage.html"
69.41.14.x "anotherrandompage.html"
Same pages, same order, definite match with a unique page like "veryspecialpage.html" that nobody else visits on the same day. Additionally, they appear to do each customer's files they monitor very quickly in a batch so it's pretty easy to see that those files are related to a single visitor making identification even simpler.

Now with a simple script I can find out who they were monitoring with extreme accuracy as long as the visitor looked at more than one page unless that one page was unique and nobody else looked at that page during the day.

Making it harder to identify which visitor they're monitoring wouldn't be that difficult just by staggering and randomizing their page requests over the course of the day. However, I still don't see how you could protect the identity of your customer if that was the only customer of the day that accessed that web site unless you throw in a few bogus page requests to throw a webmaster off the trail. Even with randomization and fake page requests you would still have a problem if that customer was the only one to access a specific page as mentioned above, but at least it would be a start in making the monitoring activity just a little more covert and possibly less traceable.

The site of mine where I did this experiment, which isn't this blog, gets from 20K-40K visitors daily, so if I can easily find a needle in that big haystack then it would be trivial on a low traffic site.

Tuesday, January 01, 2008

Romanian Scrapers Go Apeshit on New Years Day

The stealth scrapers attempting to hit my site have been really laid back lately but on Jan 1 '08 the Romanian scrapers went apeshit, or at least tried, followed by a few others.

Needless to say, the bot trap was very busy today.

So far today this is what the little Romanian fuckers tried:

89.122.29.31 requested 333 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

89.122.16.96 requested 336 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

89.122.29.35 requested 337 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

89.122.29.32 requested 336 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
Then someone from Vietnam tried to join the fun:
203.162.3.153 requested 340 pages as "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
A quick visit from the Ivory Coast:
41.207.2.87 [host-41-207-2-87.afnet.net.] requested 339 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)"
Then maybe a human with issues...

Someone from Venezuela gave a quick visit with what appeared to be a broken browser that asked for a bunch of pages that the visitor probably wasn't aware happened:
201.210.138.88 [201-210-138-88.genericrev.cantv.net.] requested 153 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Every time the browser would ask for a page it would then ask for the home page about 5-10 times in just a few seconds, what the fuck is up with that?

Anyway, it was considered an automated attack, fuck it.

Anyone else have a wild scrape attack today?

How to Identify Screen Shot Makers

Have you ever wondered how I figure out where screen shots originate from?

My trick of the trade is the SPARE DOMAIN!

All my unused domain does is print out information about whoever or whatever just visited the site with the IP address in REALLY BIG BOLD LETTERS so it's easy to read on a small screen shot thumbnail.

Therefore, if someone makes a screen shot I can tell who's doing it just by looking at the screen shot and block them from doing it a second time if I don't like what they're doing with thumbnails of my site.

DomainTools Whois and AboutUs Site Accesses Revealed

The DomainTools Whois is now collecting and displaying more information than ever about our web sites. Their Whois display used to be limited mostly to public registration information such as Whois, the IP address, where you host and the basics. Then DomainTools expanded Whois a while back and started taking data straight from our domains without permission and doesn't even look at robots.txt to see if we want to participate. The screen shots were no big deal but then they added some SEO text browser that allows people to snoop on your site and who knows what's next.

If that wasn't enough, then along came their Wiki companion site AboutUs.org, which scraped off some data as well. AboutUs does seem to use robots.txt, see backwards robots access below, but by the time you find out about the bot it's too late because you already have scraped content on your domain's AboutUs Wiki page.

Enough is enough, it's official, I'm annoyed.

Since I could find no way to "opt-out" of all the new toys on DomainTools Whois I decided it was time to opt-out the old fashioned way and just block 'em.

If they had just identified themselves in the User Agent this would've been easy because those are all monitored on my main site automatically. However, it appears that DomainTools either doesn't know how to put their information in the User Agent field for the tools they use, or they really don't want to get snared and stopped easily, because they use standard Firefox and MSIE user agents for accessing your site.

However, note that the referrer does claim that it's coming from DomainTools so you can at least use that as an indication it's them although the User Agent field would've been preferred since it is the standard for this sort of thing.

Here's a sample of DomainTools SEO Text Browser hitting your server:

66.249.16.212 "GET /" "http://whois.domaintools.com/somedomainname.com" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
The SEO Text Browser thing looks like it might be telling the webmaster who's snooping on their site because I caught it claiming to be a proxy that was forwarding information for my IP address when I was looking at the site so using it is far from anonymous!

66.249.16.211 "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11" "/" Proxy Detected -> VIA=1.1 www.domaintools.com FORWARD=aaa.bbb.ccc.ddd

Of course your average webmaster would never see this proxy information because it's not in your default log file, but I log proxy details and a whole lot more.

This is the DomainTools screen shot thumbnail generator hitting your site:

64.246.165.237 "GET /" "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"

Here's their companion site AboutUs.org which claims it uses robots.txt but didn't bother to check if I allowed them on my site until AFTER they had already been to the site as the access was in exactly the order shown below.

66.249.16.207 "GET /" "-" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"

66.249.16.207 "GET /robots.txt" "http://www.somedomainname.com/" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"

You might want to block AboutUsBot unless you want them to freely license whatever shit they scrape off your site with the claims on the bottom of their site:
All content is available under the terms of the GFDL and/or the CC By-SA License
If you want to keep them from snooping your site the IPs I'm currently blocking are:
66.249.16.*
66.249.17.*
64.246.165.* (screen shots)
So there's all I know at this time, you have robots.txt and htaccess files, you know what to do.

UPDATE:

They are also running screenshots from 216.145.16.*
Wonder how many other blocks of IPs they're using?

UPDATE UPDATE:

I was accused of confusing DomainTools and AboutUs.org!

I was never confused but whoever posted that I was confused apparently is just because I lumped them together because they operate from the same IP space, their whois records have the same address, and they have some shared data in common such as the thumbnails.

AboutUs uses the thumbnails from DomainTools and DomainTools Whois has a link from every domain to "AboutUs: Wiki article on ..." what would you call them?

I never said they were the same company, totally not confused, but whatever makes you happy.

UPDATE UPDATE UPDATE:

I knew the connections would be spelled out somewhere on the 'net when I had a little more time to do some snooping on the site.
One of the questions posed was about our connection with Name Intellignece. Jay Westerdal, CEO of NameIntelligence.com, in fact, recently stepped down as AboutUs CTO...
Confuse THAT!

Monday, December 31, 2007

How Much Nutch is TOO MUCH Nutch Revisited

To date there have been 585 unique IPs hitting my server since I started tracking this nuisance called nutch.

Here's a list of IPs with nutch sightings to date:

12.47.49.97
13.1.137.86
13.1.139.202
13.1.139.205
13.1.139.206
13.1.139.211
13.1.139.212
13.1.139.213
15.203.249.124
24.12.140.54
24.222.153.250
24.231.207.219
24.247.204.244
24.5.71.1
24.6.168.184
24.94.62.119
35.10.2.90
58.186.61.164
58.187.12.236
58.187.22.230
58.215.74.242
58.215.74.253
58.215.75.2
58.68.42.138
58.87.139.90
59.160.240.115
59.160.240.116
59.160.240.183
59.160.240.184
59.160.240.185
59.176.10.136
60.248.9.114
61.135.151.175
61.246.2.241
61.8.140.20
62.129.132.47
62.168.188.151
62.192.109.66
62.192.11.2
62.40.33.173
62.40.36.87
62.54.4.138
63.133.162.98
63.246.7.209
63.82.23.2
64.105.36.210
64.106.247.178
64.18.197.136
64.209.138.200
64.229.206.25
64.229.222.170
64.229.226.126
64.229.33.51
64.231.233.162
64.236.128.27
64.241.242.18
64.242.88.10
64.242.88.60
64.34.172.78
64.34.180.167
64.38.10.26
64.47.51.158
64.71.164.125
65.120.64.146
65.220.67.9
65.92.160.39
65.95.155.163
66.132.240.180
66.132.249.23
66.135.44.34
66.135.44.35
66.135.44.36
66.135.44.37
66.135.44.38
66.135.44.39
66.135.44.40
66.135.44.41
66.135.44.42
66.135.44.43
66.135.44.44
66.135.44.46
66.135.44.48
66.135.44.49
66.135.44.50
66.135.44.51
66.135.44.52
66.135.44.53
66.15.68.234
66.207.120.226
66.24.192.59
66.24.198.171
66.24.199.39
66.24.240.206
66.243.31.34
66.30.10.222
66.92.153.138
67.110.56.45
67.110.58.2
67.111.28.139
67.184.246.61
67.202.20.30
67.202.49.49
67.202.6.11
67.52.101.242
67.68.42.2
67.70.155.226
67.71.89.27
67.95.51.86
68.178.171.109
68.178.202.79
68.205.124.164
68.205.127.94
68.228.72.198
68.97.222.117
69.248.26.83
69.36.233.8
69.55.233.28
69.60.125.233
69.90.45.7
69.93.236.178
70.143.79.234
70.187.130.253
70.197.81.79
70.21.122.162
70.48.46.56
70.50.75.8
70.56.66.216
70.62.103.114
70.85.198.178
70.87.14.34
70.90.188.18
70.96.99.254
71.216.0.210
71.217.33.149
71.241.153.125
71.35.163.79
71.98.182.170
72.0.207.162
72.2.25.66
72.2.25.67
72.2.25.71
72.21.6.146
72.21.6.147
72.21.6.148
72.232.202.50
72.232.223.234
72.232.228.58
72.233.38.194
72.233.38.195
72.233.38.196
72.233.38.197
72.36.114.145
72.36.114.147
72.36.115.42
72.36.115.45
72.36.115.47
72.36.115.52
72.36.115.53
72.36.115.54
72.36.115.56
72.36.115.57
72.36.115.59
72.36.115.64
72.36.115.65
72.36.115.68
72.36.115.69
72.36.115.70
72.36.115.72
72.36.115.73
72.36.115.74
72.36.115.77
72.36.115.79
72.36.115.80
72.36.94.100
72.36.94.106
72.36.94.107
72.36.94.109
72.36.94.110
72.36.94.112
72.36.94.113
72.36.94.118
72.36.94.119
72.36.94.121
72.36.94.122
72.36.94.123
72.36.94.124
72.36.94.169
72.36.94.173
72.36.94.176
72.36.94.179
72.36.94.181
72.36.94.182
72.36.94.20
72.36.94.201
72.36.94.203
72.36.94.243
72.36.94.38
72.36.94.39
72.36.94.48
72.36.94.50
72.36.94.52
72.36.94.54
72.36.94.56
72.36.94.60
72.36.94.61
72.36.94.68
72.36.94.90
72.36.94.92
72.36.94.96
72.36.94.99
72.36.95.12
72.36.95.131
72.36.95.134
72.36.95.145
72.36.95.146
72.36.95.147
72.36.95.148
72.36.95.149
72.36.95.150
72.36.95.152
72.36.95.154
72.36.95.155
72.36.95.156
72.36.95.157
72.36.95.158
72.36.95.160
72.36.95.161
72.36.95.162
72.36.95.165
72.36.95.166
72.36.95.167
72.36.95.168
72.36.95.170
72.36.95.173
72.36.95.176
72.36.95.177
72.36.95.178
72.36.95.179
72.36.95.183
72.36.95.185
72.36.95.207
72.36.95.209
72.36.95.212
72.36.95.214
72.36.95.217
72.36.95.218
72.36.95.226
72.36.95.227
72.36.95.230
72.36.95.231
72.36.95.232
72.36.95.236
72.36.95.237
72.36.95.238
72.36.95.239
72.36.95.251
72.44.58.104
72.44.58.167
72.44.58.173
72.44.58.244
72.44.58.252
72.44.62.107
72.44.62.122
72.44.62.124
72.44.62.151
72.44.62.162
72.44.62.166
72.44.62.197
72.44.62.199
72.44.62.208
72.44.62.245
72.5.173.12
72.5.173.22
72.51.37.148
72.84.30.230
74.111.22.20
74.111.7.226
74.208.11.120
74.39.192.237
74.52.54.130
74.69.164.2
74.98.30.178
74.98.32.176
75.126.142.100
75.126.204.194
75.44.225.44
80.38.119.131
80.79.35.55
81.173.148.94
81.173.155.210
81.203.142.109
81.67.169.232
81.93.168.211
82.150.138.138
82.150.138.139
82.16.40.198
83.149.77.7
83.246.79.28
84.101.58.177
84.101.58.70
84.191.111.92
84.231.72.32
84.231.74.47
84.57.138.191
85.117.62.114
85.145.108.135
85.17.184.39
85.17.184.41
85.177.142.252
85.179.194.32
85.179.196.134
85.18.14.22
85.214.83.174
85.52.193.36
85.88.35.34
85.88.35.35
85.88.35.37
85.88.35.41
87.139.106.60
87.233.142.106
87.242.77.169
87.69.22.130
87.98.222.116
88.191.23.109
88.198.212.50
88.74.95.48
89.149.208.224
89.31.118.248
123.113.184.253
124.157.145.165
124.32.246.36
124.32.246.45
128.174.240.249
128.174.240.251
128.174.241.130
128.174.245.163
128.208.1.160
128.208.3.173
128.208.4.10
128.208.6.125
128.208.6.200
128.208.6.207
128.208.6.226
128.208.6.227
128.208.6.232
128.208.6.75
128.208.6.77
128.238.35.93
128.95.1.189
128.97.88.68
128.97.88.70
129.242.19.138
129.34.20.19
129.78.64.106
131.112.125.102
131.112.125.103
131.112.125.104
131.112.125.106
131.112.16.220
131.211.84.21
132.178.248.36
132.178.248.47
133.30.112.143
140.247.62.79
140.247.62.80
141.30.193.12
141.30.193.5
141.30.193.6
144.92.194.22
145.99.243.67
147.202.73.2
147.202.74.2
147.202.76.2
147.202.81.2
147.202.90.2
159.226.5.82
164.67.195.201
164.67.195.245
164.67.195.26
164.67.195.27
164.67.195.67
164.67.195.68
164.67.195.86
166.214.93.76
192.17.240.18
192.17.240.19
192.17.240.20
192.17.240.21
192.17.240.22
192.17.240.25
192.17.240.26
192.17.240.27
192.17.240.28
192.17.240.29
192.17.240.30
192.17.240.32
192.17.240.33
192.17.240.34
192.17.240.36
192.17.240.41
192.17.240.42
192.17.240.43
192.17.240.44
192.17.240.45
192.17.240.46
192.17.240.47
192.17.240.48
192.17.240.50
192.17.240.52
192.17.240.53
192.17.240.54
192.17.240.55
192.17.240.56
192.17.240.57
192.17.240.58
192.17.240.59
192.17.240.60
192.17.240.62
192.17.240.65
192.17.240.71
192.17.240.73
192.17.240.74
192.17.240.76
192.17.240.79
192.17.240.81
193.138.250.141
193.138.250.237
193.145.45.68
193.203.240.117
193.203.240.118
193.203.240.119
193.203.240.120
193.203.240.121
193.203.240.122
193.203.240.135
193.205.213.166
193.252.148.51
193.42.229.3
193.42.84.5
194.153.145.119
194.153.145.15
195.250.53.25
195.72.131.70
195.72.131.71
195.72.131.72
195.72.131.73
195.72.131.74
195.72.131.75
195.72.131.76
195.72.131.77
195.72.131.78
195.72.131.79
195.72.131.80
195.72.131.81
195.72.131.82
195.72.131.85
195.72.131.86
195.72.131.87
195.72.131.88
195.72.131.89
195.72.131.90
195.72.131.91
195.72.131.92
195.72.131.93
196.203.50.219
198.87.235.130
198.87.235.142
199.4.160.10
200.152.240.214
202.10.82.98
202.174.61.198
202.20.190.235
202.20.192.195
202.69.141.20
202.98.1.120
203.113.130.205
203.147.0.44
203.199.83.162
203.244.218.1
204.123.46.105
204.123.47.91
204.228.230.38
204.228.230.43
206.222.21.2
206.222.9.122
207.115.108.202
207.176.224.241
207.176.224.244
207.176.224.245
207.214.93.42
208.109.126.135
208.64.57.65
208.96.10.200
208.96.10.201
208.96.54.71
208.96.54.72
208.96.54.73
208.96.54.76
208.96.54.77
208.96.54.79
208.96.54.80
208.96.54.81
208.96.54.82
208.96.54.83
208.96.54.84
208.96.54.85
208.96.54.86
208.96.54.88
208.96.54.89
208.96.54.90
208.96.54.91
208.96.54.95
209.139.209.220
209.139.209.224
209.51.212.10
209.51.212.18
209.51.212.26
209.85.62.159
209.85.62.162
209.85.88.150
210.174.3.130
210.196.73.193
210.245.31.15
210.245.31.18
211.152.34.34
212.101.97.63
212.12.114.238
212.137.33.140
212.156.230.210
212.166.192.129
212.174.130.121
212.174.130.122
212.58.116.72
213.132.171.245
213.132.175.101
213.157.204.141
213.219.170.12
213.251.133.12
216.163.188.200
216.163.188.201
216.182.225.186
216.182.229.37
216.182.229.39
216.182.229.91
216.182.230.40
216.182.230.54
216.182.230.75
216.182.236.46
216.182.236.77
216.182.237.45
216.182.238.83
216.231.36.92
216.24.131.152
216.58.87.217
216.93.185.12
217.10.144.242
217.106.233.192
217.153.59.26
217.31.51.128
217.80.112.146
218.25.39.81
220.130.191.231
220.130.191.232
220.130.191.233
220.130.191.234
220.130.191.235
220.130.191.236
220.130.191.237
220.130.191.238
220.130.191.239
220.130.191.240
220.226.195.162
220.226.195.163
220.226.195.165
220.226.195.166
220.226.195.167
220.226.195.168
221.114.253.210
221.116.237.114
221.221.140.114
221.221.237.35
222.173.249.33
222.210.196.26
222.46.17.43
222.46.17.47
If I weren't blocking nutch my server would probably be down in flames from the nutch DDoS.

Nothing dangerous about giving away code, not a thing.

Saturday, December 22, 2007

Covenant Eyes Needs Accountability

Here's yet another company making money from hitting your server without permission.

This one is an online service called Covenant Eyes that has a Net Nanny type of service that's been hitting one of my sites for ages. Over time they have requested thousands of pages, never got anything but an error message, but always keep trying using a blank user agent.

They operate from this range of IP's:

Covenant Eyes, Inc MOG-69-41-14-0 (NET-69-41-14-0-1)
69.41.14.0 - 69.41.14.127
Yesterday they suddenly started using this user agent after years of being blank:
69.41.14.83 "libcurl-agent/1.0"
The website claims:
Covenant Eyes Software provides Internet Integrity with accountability reports.
I guess it depends on who defines "Internet Integrity" or "accountability" because I personally don't find much integrity or accountability in hiding why you're hitting my website behind blank user agents or some default user agent.

The site also claims:
A church in town lost it's pastor to porn...
Which brings up the point that any rogue webmaster could cloak very bad content to Covenant Eyes and think it's funny to get someone in trouble that has an "accountability report" sent to a boss, spouse or parent so I hope someone checks to make sure these reports are accurate before punishing someone.

IMO blocking 69.41.14.* should stop their members from being "tempted" to visit your sites.

Wednesday, December 19, 2007

Snared Human Claims "I Ain't No Bot!"

When you snare a human in your bot trap they might be a little feisty and squirm a little. Those snared humans may even send you a scathing email claiming complete innocence, your tools are broken, bad bot blocker, BAD!

Amazing that his tool appears to be the one broken, not mine!

I nicely replied to this snared human and asked if he could explain why he downloaded a couple of hundred pages in just a few minutes, many of them the same page over and over and over again, sometimes several per second.

Sorry Mr. Human but your browser exhibits the same behavior as one of those high speed scrapers that have attacked me in the past and you were shut down for behaving badly.

I suspect he has PRE-FETCH enabled which is amusing because I have PRE-FETCH disabled server-side, so if he has it enabled it didn't identify itself as PRE-FETCH which is why he was snared.

Oh boo hoo, guess you'll just have to go waste someone else's bandwidth using that stupid browser that keeps downloading the same pages as fast as it can download them.

I won't miss you and don't let the door hit you on the way out.

Monday, December 17, 2007

Yahoo! Ignorance Shines in ShoeMoney Reputation Attack

Q: What do you do when your payment processing anti-fraud detection doesn't work?

A: It appears you fire your referring affiliate if your name is Yahoo!

That's right boys and girls, according to ShoeMoney the nitwits at Yahoo! obviously can't detect a fraudulent transaction and then blame someone who's under fire with a blatant reputation attack.

Now Yahoo! Stores and other properties do a lot of payment processing so they should have a ton of historical data, potentially from valid uses of the stolen credit cards themselves, so wouldn't you think with all this information they could flag a few fraud sales?

Apparently not.

OK, even if you don't have any historical data on the customer there are a few things you can do to easily combat what appears, based on the volume of transactions, to be automated fraud short of firing one of your affiliates.

1. Validate the account with email confirmation BEFORE processing the credit card in a 2 step process known as AUTH and BOOK. You pre-authorize the sale first, setting aside the money until you're sure the sale is valid and then BOOK the sale after the fact.

2. Require that the account creation and/or checkout page use several forms of automation blocking such as javascript and/or some form of captcha.

3. Obviously use full AVS (Address Verification) and require CSC / CVV2 (Credit Card Security Code) to make sure everything is OK per the credit card company.

4. Use GeoIP services to check that the IP address placing the order is even close to the actual address on the order and if not, flag it for human review before processing.

5. Do some basic IP blocking and restrict access to those account creation pages from hosting data centers, lists of known proxy servers, botnets and spammers.

There's a couple of other steps I'd take as well, but if someone could get past the 5 steps above without anything tripping at least one alarm for human review, I'd be shocked. Even if it was a human manually performing the attack the GeoIP should indicate a problem unless Yahoo just ignores it.

The only thing that cracks me up is ShoeMoney wanted to know what the referring URLs were and it's meaningless because the referring URL can be easily spoofed or blocked so it's a useless piece of information.

Consider that whoever did this only needed to visit your site one time to get your affiliate code and then using automation abuse it over and over again without ever visiting your site a second time and claiming in the referrer to be always coming from your site.

Cute huh?

Better yet, they didn't have to visit your site EVER because you allow your pages to be cached in the search engines so anyone could get your affiliate code directly from the search engines without leaving a trail on your website.

I've been preaching about using the meta "NOARCHIVE" for years now and this is just another reason to use it, but nobody listens and I digress...

Just to prove that the Michelle from Yahoo! was completely clueless about how internet fraud works she asked ShoeMoney to do the following:

I wanted to give you a heads up in advance to see if there was anyway you could filter or prevent fraudulent users from coming through your website/links. If so, we’d like to continue our partnership.
The odds are very high that this activity isn't passing through ShoeMoney's site whatsoever, even if it's being done manually, because they don't want to leave a trail that's too obvious.

Sorry to see you get the boot Shoe (punny) but it would appear that Yahoo! doesn't mind making a public spectacle of their shortcomings and now it's open season on YSM thanks to them admitting they can't tell a fraud transaction.

This should be loads of fun to see what happens next.

Monday, December 10, 2007

Block List Babelfish Desperately Needed

After spending a few days trying to come up with a more comprehensive method of identifying known pre-existing bad IPs using the existing block lists it has become quite maddening.

SpamHaus has their collection criteria which comes up with one set of BL results, ProjectHoneyPot has their methods and even different results, and so on and so forth. Then I have my methods which traps IPs that may intersect those BL's but quite often cough up brand new IPs not showing in the other BLs for spammers and scrapers. Collectively all of these BLs, including my own, are quite comprehensive but unfortunately there's no easy way to combine them all in a real-time manner that makes sense.

Sadly, the current state of affairs is that there are just too many independent services to use that makes the process overwhelming for the average webmaster which probably opts just to pick one, which would let things slip through the cracks, out of frustration. Picking block list A over block list B might be the difference between your server getting hacked just because one list knew about the malicious botnet IP and the other list didn't.

Funny, if this were anti-virus software people wouldn't just pick any old thing, they would want comprehensive coverage, so why can't we get comprehensive coverage in block lists?

What is desperately needed is some mechanism to pool all the results together into one common service, a Block List Babelfish, where a single access can get the combined collective intelligence on whether the IP is good or bad so that everyone can easily benefit.

If anyone knows of a good BL aggregator let me know, OK?

Saturday, December 08, 2007

Validate Link Integrity Using DNSBL's like SpamHaus ZEN

People tend to just think that lists from sites like SpamHaus are only good for blocking spam from coming into your servers but that's just the tip of the iceberg if you're open to some creative thinking.

Since Google penalizes sites that link out to bad neighborhoods one potential use for SpamHaus ZEN is to help automatically identify bad sites and remove them. For people that run directories or have massive amounts of outbound links this means you can protect your visitors, as well as your reputation in Google and other places, via zen.spamhaus.org and eliminate links to IPs associated with spammers, 3rd party exploits, proxies, worms and trojans!

How's that for a kick ass way to clean up your site?

Keep in mind that on a shared server that a single IP address may represent multiple domains on a server. That means any domain on a server either spamming or otherwise compromised will impact all domains associated with that IP so many people may be effected that don't know there's a problem. However, since that server can be a hazard to the general population at large, it's best to err on the side of caution and suspend your association with all sites on that server until the problem is resolved.

Since most sites don't even know that they've been infected I merely quarantine those links until they are no longer being reported as hostile and then enable them again after they have been confirmed to be clean.

Not that everything will be listed in SpamHaus ZEN as much of the malicious activity I see isn't in their index, but it's a good reference for known bad sites.

Here's an example of how to check an IP address in SpamHaus using a spammers IP currently in the DNSBL.

Take the IP address 64.151.120.13 and reverse it to 13.120.151.64 and then combine the IP address to zen.spamhaus.org like this: 13.120.151.64.zen.spamhaus.org.

Using any DNS checking tool, query the DNSBL for the existence of 13.120.151.64.zen.spamhaus.org.

The IP is currently in the DNSBL you'll get a result like this:

host 13.120.151.64.zen.spamhaus.org
13.120.151.64.zen.spamhaus.org has address 127.0.0.2
If the IP address is not in the DNSBL you'll get a response like this:
host 13.120.151.123.zen.spamhaus.org
Host 13.120.151.123.zen.spamhaus.org not found: 3(NXDOMAIN)
The result codes from SpamHaus are as follows:
127.0.0.2 - SpamHaus Block List (SBL)
127.0.0.4-8 - Exploits Block List (XBL)
127.0.0.10-11 - Policy Block List (PBL)
The last list, the PBL, is probably something I wouldn't auto-block with a link checker or any other use (except anti-spam) unless I reviewed what it was blocking first so those errors, if they ever come up, are only set as "warnings" in my current implementation.

Thursday, December 06, 2007

Bad Behavior Needs Behavior Modification

WebGeek recently reported on Bad Behavior Behaving Badly where he got locked out of all his own blogs and was listed as an enemy of the state and put on the FBI's 10 most wanted geek list and all sorts of things.

OK, I'm exaggerating but read his post and it's close enough.

Anyway, there was something he mentioned about being concerned with:

"If left unattended in this state for a long time, a site could lose valuable search engine rankings, after the spiders of the Big 3 (Google, Yahoo, and MSN) find that they are locked out repeatedly with 403 errors."
Since he mentioned it, I've looked over the source code for Bad Behavior before and how they validate robots isn't something I'd put on my website because it relies solely on IP ranges alone and they are incomplete based on raw information I've collected from the crawlers themselves.

The search engines have clearly stated that they may expand into new IP ranges at any time without notice and the only official way to validate their main crawlers is with full round trip DNS checking to validate Googlebot for instance with IP ranges as a backup just in case they make a mistake.

So this code could easily be obsolete at any time:
if( stripos($ua, "Googlebot") !== FALSE || stripos($ua, "Mediapartners-Google") !== FALSE) {
require_once(BB2_CORE . "/google.inc.php");
}

// Analyze user agents claiming to be Googlebot
function bb2_google($package)
{
if (match_cidr($package['ip'], "66.249.64.0/19") === FALSE && match_cidr($package['ip'], "64.233.160.0/19") === FALSE) {
return "f1182195";
}
return false;
}
Even more importantly, I've tracked Google crawlers in the following IP ranges which is 2 more IP ranges than Bad Behavior has in their code!
64.233.160.0 - 64.233.191.255
66.249.64.0 - 66.249.95.255
72.14.192.0 - 72.14.239.255
216.239.32.0 - 216.239.63.255
The same criticism exists for validating the other bots in that Bad Behavior needs to have a little more robustness in the validation code so that it isn't accidentally blocking valid robots from indexing web pages. Unless I'm missing something I don't even see where Yahoo crawlers are specifically validated (I'm tracking 11 IP ranges for Yahoo) and MSNBOT was missing the 131.107.0.0/16 CIDR range, etc..

As it stands, the code doesn't have all the IP ranges that I've seen used for any of the major search engines so there is some risk, albeit not a big risk, that some legitimate search engine traffic is being bounced.

Not only that, but the MSIE validation is full of holes and most of the stealth crawlers I block will zip right through Bad Behavior and scrape the blog.

I think WebGeek is right, I would disable the add-in until those issues are resolved.

LiteFinder REALLY Go Fuck Yourself Now

In my opinion this whole LiteFinder Network Crawler is completely bogus.

Yesterday I commented on their crawler, which now just appears to be a ruse to lure people to their web site which is nothing but a big front for affiliate links.

Go to the LiteFinder home page and take a look at the main topics: Adult: Penis Enlargement, Online Gambling or the popular searches for "Phentermine" or "Breast Enlargement Pill".

Riiiiight.

This site is so spammy it would make Sanford Wallace blush.

The so-called search feature doesn't search shit, it just spits up a bunch of bullshit links.

Here's the results for a query on PLUMBING:

Shop
Browse and compare a great selection of .
www.somesite.com

Save up to 95% - diamond jewelry, engagement rings, designer watches, and much more. Live auctions starting at one dollar
somedomain.com

Gold, and Silver Jewelry
Great selection of jewelry including Rings, Necklaces, Bracelets, Pendants, Earrings, Body Jewelry, and Spazio watches.
somejewelry.com

Bored? Check Out the Sumo!
Viral video mayhem. Games Galore. Sucker free music. Bangin' Hotties. Animation for your fascination. Go to the Sumo, live large and never be disappointed by a weak video website again.
www.somesite.com

Etc. you get the idea...

What purpose is a crawler that doesn't feed a search engine?

You've got it, it's a lure, we've been had.

This LiteFinder Network Crawler thing just needs to be blocked, that's all there is to it.

Wednesday, December 05, 2007

LiteFinder Network Crawler Go Fuck Yourself

I don't get too riled up until I read some self-serving pompous bullshit like this that just makes the hair stand up on the back of my neck:

Can I learn the IP addresses, which LiteFinder Network Crawler comes from?
Unfortunately, You can't since it is against the rules of our company.
The user agent for this mess is:
"Mozilla/5.0 (compatible; LiteFinder/1.0; +http://www.litefinder.net/about.html)"
Since they don't feel like sharing the IP addresses, let me do the honors since it's not against MY company policy:
208.101.44.3 -> mybluewine.net.
209.160.65.42 -> hopone.net.
209.62.109.178 -> ev1s-209-62-109-178.ev1servers.net.
216.40.220.34 -> ev1s-216-40-220-34.ev1servers.net.
216.40.222.50 -> ev1s-216-40-222-50.ev1servers.net.
216.40.222.66 -> ev1s-216-40-222-66.ev1servers.net.
216.40.222.82 -> ev1s-216-40-222-82.ev1servers.net.
216.40.222.98 -> ev1s-216-40-222-98.ev1servers.net.
67.19.114.226 -> w103.networkharmony.com.
67.19.250.26 -> 1a.fa.1343.static.theplanet.com.
70.85.113.242 -> f2.71.5546.static.theplanet.com.
74.53.243.226 -> e2.f3.354a.static.theplanet.com.
74.53.243.242 -> f2.f3.354a.static.theplanet.com.
74.53.244.18 -> 12.f4.354a.static.theplanet.com.
74.53.249.34 -> 22.f9.354a.static.theplanet.com.
74.86.209.74 -> templatestill.com.
74.86.249.98 -> westhoste.net.
75.125.18.178 -> ev1s-75-125-18-178.ev1servers.net.
75.125.47.162 -> ev1s-75-125-47-162.ev1servers.net.
75.125.52.146 -> ev1s-75-125-52-146.ev1servers.net.
84.19.176.208 -> ns.km22118.keymachine.de.
87.118.118.111 -> ns.km31417.keymachine.de.
87.118.98.57 -> ns.km22427.keymachine.de.
87.118.98.62 -> ns.km22426.keymachine.de.

There you go, all the IPs I've seen them use and they can shove the rules of their company where the sun doesn't shine.

Surge Protection - Get it before it's TOO LATE!

I know many of you think surge protection is a bunch of hype but the father of a good friend just found out a few days ago that surge protection is a must have. Lightning apparently zapped their house and took out every single appliance, TVs, radios, computers and a nice big Wurlitzer organ all in one shot totaling over $20K in damages.

That was just enough to make me get off my ass and double check that all of our most expensive gear, like my computer, printers, big screen TV, DVRs, etc. were all plugged into the proper place on the UPS/Surge protector since the rainy season is starting in California.

For those of you that still have doubts about surge protection, and the odds that lightning will never hit your house, let me tell you about an old buddy of mine from Kansas City. He had a computer that got hit by lightning on the power line, fried the box. He went out and got a new computer and a surge protector for the electrical line. Then about a year later lightning hit the phone line and blew his computer apart when it came in via the modem. Again, he replaced the computer and this time put a surge protector on his phone line as well. Unfortunately, God didn't want him to have a computer and the 3rd time lightning shot in through the window and blew the computer off his desk. Last time I checked they don't make surge protectors for windows.

Anyway, if you don't have a surge protector for your electrical, phone and cable it's time to install one and move the computer away from the window so lightning can't easily blast it off your desk just to show you who's boss.

GEO Targeting Issues with Sprint Wireless Broadband

Testing my new Sprint Wireless Broadband turned up something that I didn't quite expect in regards to Geo targeting because the IP addresses used all are attributed to Southern California and I'm in Northern California.

I understand that privacy is a concern and you don't want people to know exactly where you are but being off by 600 miles is a bit much as nothing works right that tries to Geo target and some things can become down right annoying, such as AdSense showing you ads for local shit in Irvine California.

Nothing show stopping, just annoying.

Saturday, December 01, 2007

Comcast Dead While Sprint Hobbles Along

My connection to the internet has been so reliable for so many years that I had almost forgotten that the whole goddamn thing is cobbled together with bailing wire, band-aids and bubble gum.

Comcast in their infinite wisdom apparently did an upgrade to the network sometime Thursday afternoon and BOOM! the whole city went offline. When I called the message on their support line said people in my area just needed to power cycle the modem and it would reconnect. OK assholes, I had power cycled the fucking modem BEFORE I called for your tech support dept. to dole out bushels of meaningless platitudes and it still wasn't working so it's obvious I'm already fucked.

Tech support lady answered and asked me for my MAC address and in a couple of seconds confirmed that I was fucked and someone with a can of vasoline and some rubber gloves would be sent out the next morning to finish the job, er FIX the problem.

Next morning someone shows up right on time, which was an omen, and diagnoses the connection. Claimed everything was OK coming in so it must be the old modem, yeah right, whatever, swaps out the modem and gets us online and leaves.

Looks good, quick fix, right?

Wrong!

The new cable modem starts randomly taking a dump for a few minutes here and there and the next morning promptly decides to take a permanent dump and never comes back.

<SARCASM style: thick>
Yup, it was definitely the old modem having a problem.
</SARCASM>

So back to waiting online for the next technical support moron that knows way less about modems that I do, considering I've written software to drive a modem, which makes the idiotic conversation we're about to have not only insulting but maddening.

Here comes the idiot tech support questions:

TS: "Can you power cycle the modem for me?"
ME: "If that worked we wouldn't be on the phone at the moment!"

TS: "Do you have the modem connected directly to the computer or a hub?"
ME: "What does that matter? A stand alone cable modem plugged into Comcast alone will synch to the network if it can find the network, which it can't. Would you like me to explain to you what those lights mean on the front of the modem? I've got plenty of time since I can't get onto the internet and do any work..."

TS: "We can't seem to contact your modem from here so we'll need to send out a service technician."
ME: "Same problem as yesterday that you already 'fixed' once but we can try it again."

Anyway, they finally gave us a time for the next service technician to arrive tomorrow.

To be honest, if Comcast is down it shouldn't matter because our city is Wifi enabled!

Yeah, right, I'm on the border of the city's Wifi signal so I can see it every now and then but it's not strong enough to connect with.

However, a bunch of idiot neighbors have unprotected wireless networks that I could just hop on and use if I were that kind of guy, tempting but no thanks.

Anyway, here I site with ZERO faith in Comcast at the moment so I ran over to the Sprint store and picked up one of those nifty Wireless Broadband USB devices with an unlimited bandwidth plan for $60/month and a screaming [cough] 500kbps, but it beats dial-up.

Bring that new Sprint toy home, plug it in to the USB port, it self-installs and works out of the box without a hitch, sweet, right?

Well, it would be sweet except their fucking "Sprint Mobile Broadband Connection Manager" started crashing all the time. The application just blows up without warning, BLAMMO!, and down goes your connection. As a matter of fact, my computer NEVER crashes and this unstable software managed to lock up the PC to the point I had to do a cold reboot.

Guess I'll focus on the positive side that at least Sprint got me online, for some period of time, which is more than I can say for Comcast in the last few days.

They better get this shit fixed tomorrow because I'm bordering on going ballistic at the moment.

UPDATE: Comcast actually showed up on time and figured out the problem the second time and it was never the modem they replaced causing the problem, but what else is new.

Monday, November 19, 2007

Live.com's Search Spam Hysteria and Area 131.107.0.*

There are a lot of recent posts from people reaching a near hysteria fever pitch over what appears to be Live.com scouring the 'net looking for black hat sites doing things like cloaking or worse.

What they're all posting about appears to be that MS Live.com is doing some stealth crawling that appears to be sending bogus query strings looking for pages that change their response based on the query, which is what cloaked web sites do, and display advertising related to the topic that brought you to the page.

However, I've seen a few thousand other mysterious page requests from that IP range which most of you probably haven't noticed that I'll share below, which may or may not be related, hard to say at this point.

Sometimes, but not always, the IP address claims to be coming via a proxy such as:

1.1 SEA-PRXY-02
1.1 SEA-PRXY-01
"1.1 NET-PRXY-03, 1.1 NET-PRXY-04"
1.1 NET-PRXY-04
1.1 RED-PRXY-30
... and more
Maybe some of this is unrelated, maybe it's totally relevant, who knows except MS and they aren't telling. However, starting as far back as 01/07/2007 my bot blocker started trapping what appeared to be stealth crawl activity in the 131.107.*** range:
01/07/2007 131.107.0.96
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)"

01/12/2007 131.107.0.95
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)"

01/15/2007 131.107.0.104
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; I
nfoPath.1; .NET CLR 2.0.50727)"
Then it appears a human responded to a bot challenge:
01/15/2007 15:56:38 RESPONSE 131.107.0.104
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; In
foPath.1; .NET CLR 2.0.50727)"
Then this BLANK user agent started hitting on the same day
01/15/2007 131.107.0.86 ""
Then the sudden challenges and responses on 131.107.0.104 happened again so maybe that really was a human behind at least one of those proxies, who knows.

The blank UA on 131.107.0.86 kept asking for thousands of pages for many weeks, including "/robot.txt" that made me giggle.

In the middle of all this there's this little nugget:
03/29/200 131.107.0.96 "Wget/1.8.1"
Then in March there's another rash of challenge's in 131.107.0.* and a single response on 131.107.0.104:
04/28/2007 RESPONSE 131.107.0.104
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; In
foPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"
What does it all mean? No clue yet...

Suddenly after months the blank UA's on the 131.107.0.104 megacrawl seem to come to a close.

Then we get this little gem:
05/30/2007 131.107.0.95 "LWP::Simple/5.805"
June has a mix of challenges and a couple of responses so humans may use that IP block every now and then.

Then these nuggets pop up:
07/10/2007 131.107.0.95 "Java/1.6.0_01"
07/10/2007 131.107.0.96 "Wget/1.8.1"
07/13/2007 131.107.0.86 "" the blank UA starts crawling again.
Blank UA shows up on other IPs:
07/23/2007 131.107.0.101 ""
07/23/2007 131.107.0.104 ""
07/23/2007 131.107.0.96 ""
07/24/2007 131.107.0.73 ""
07/26/2007 131.107.0.96 ""
07/27/2007 131.107.0.95 ""
Now one IP with blank UA crawls a few days:
10/16/2007 to 11/05/2007 131.107.0.104 ""
Then the PERL crawl begins:
11/15/2007 131.107.0.96 "libwww-perl/5.805"
11/16/2007 131.107.0.95 "libwww-perl/5.805"
And those last two IPs are still currently crawling as "libwww-perl/5.805" as I write this.

When you add it all up a couple of things that come to mind are that Microsoft is checking for cloaking, has some pet projects possibly being tested and/or they are checking to see how websites respond to a browser user agent vs. user agents that are normally blocked and it's probably a mix of all the above.

See the response from msndude msg#3442263 on WebmasterWorld:
First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.

Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.
Please tell me what gives you the right to scan thousands of pages without permission and then threaten to dump our ass if we don't let you run rampant without control over our website?

That's some pretty big balls even for Microsoft!

Since it's annoying some people for no sane reason I say go block the IP range and go back to sleep because Microsoft doesn't send enough traffic to put up with this abuse in the first place.

Besides, Microsoft has some damned explaining to do before they have any room to bully people as I've got quite the list of documented abuse from that IP range that would justify anyone blocking the bad behavior exhibited on 131.107.0.*.

That's my $0.02.

FIRST LOOK: Yahoo Crawler Using Firefox UA

Woke up this morning to find my bot blocker had bitch slapped 300+ crawl attempts by Yahoo using the following criteria:

74.6.22.170 [llf520057.crawl.yahoo.net.] requested 302 pages as
"Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071102 BonEcho/2.0.0.4"
Upon further examination it appears that this activity started on 11/17/2007 and the IP address used is a Yahoo proxy and some of the forwarded IPs were:
74.6.18.46 -> rz502516.crawl.yahoo.net.
74.6.18.160 -> rz502426.crawl.yahoo.net.
74.6.18.163 -> rz502429.crawl.yahoo.ne

a lot more 74.6.18.* IPs etc., you get the idea...
What was curious is the version of Firefox claimed to be Bon Echo which if I'm not mistaken was pre-release Firefox 2 code.

Didn't look like they were making screen shots based on todays activity unless they had already cached the images so I'm not sure what in the hell Yahoo's up to at this point.

Take a look in your logs as I find it hard to believe I'm the only one seeing this.

Saturday, November 17, 2007

Don't Just Block Spam, Block Spammers Too!

Most modern blog anti-spam efforts are based on just protecting the comment forms which is a very narrow focus. When some spambot or someone posts something bad it's automatically trapped and discarded by tools like Askimet. However, I don't think this solution goes far enough to solve the problem as it only puts a band-aid on the comments page.

What I'm going to suggest, which I recently did to a few of my sites, is to go a step beyond just the comments page and punish bad behavior with banishment.

Why not ban the spammer?

You've trapped the spam and you know he/she/it is up to no good so why let them continue to access your site at all?

What if tools like Askimet not only blocked the spam but locked the spammer out of every site running Askimet worldwide?

If Askimet and a bunch of the other anti-spam tools could pool their spammer data then you could effectively block them from ever accessing any website ever again.

Now THAT's how you punish a spammer, ban him from the worldwide community!

This is not a new concept as RBL lists have been used for things like this in the past as spammers IP's were not only used to block incoming mail but added to the server firewall as well. However, the more recent web-based technologies have tended to be very narrow focused and missed the bigger opportunity to thwart problem spammers in a better way such as ACCESS DENIED to the web in general.

Consider that many modern well protected websites that are cranking up security block access from data centers and proxy servers leaving spammers few options besides direct residential connections and botnets. Assuming spammers might rent out botnets it would have to be hijacked residential PC's since servers from blocked data centers won't do them much good being often blocked already. Therefore, assuming spammers were forced to use botnets to do their bidding, they would unwittingly block innocent people that would shortly discover their machines are infected and get them fixed.

What a concept!

Ostracizing spammers could even get people with compromised PC's off the botnet too!

Spammers would think twice about ever spamming again if each attempt permanently cost them more and more access to the web so maybe, just maybe, we can end spam in our lifetime just by changing the anti-spam technology being deployed as a complete front-end security system for the website after the comment form triggers the alarm and alerts the entire anti-spam community.

OK, there could be a few innocent casualties but the greater good to permanently eradicate spam and even botnets completely outweighs the impact of a little friendly fire.

I'm banning spammers to clean up the online environment, how about you?