Saturday, January 28, 2006

Get a Grip You Loon!

Where does someone come off going nuts on me about a free listing on a free website?

Someone sent about a dozen emails and left a bunch of voice mails ranging from 7am-11pm while I was away for a couple of days jumping up and down wanting instant support for something that cost them nothing.

Come on people, it's FUCKING FREE!

This is the same reason I started dumping customers opting just for my own websites to get away from whiny assed people so DON'T PUSH ME ASSHOLES or the plug could be pulled!

Spammers Succeed Elsewhere but WHY?

Went looking for other sites potentially hit with the same spam garbage that my site kicked out and was shocked at just how vulernerable all these sites were.

Come on people, these spammers were trying to submit their garbage via an HTTP GET instead of a POST command so it's obvious there are some real shitty programmers out there just letting any old crap get submitted any way it can. Guess I never realized just how bad blog spam is as most places I frequent are pretty clean. However, what I saw today after reviewing a bunch of sites hit by the idiots knocking on my door is that most of the blame can be placed on shitty programming and poor validation techniques.

Case in point is this article "Some Things I Learned in 18 Years of Programming" and apparently in all those years form validation and anti-spamming techniques weren't in the list. Scroll down the page and you'll see what I mean, a real belly laugh if it wasn't so sad in the first place.

More blame could be put on the spammers but it's like locks, doors and theives. If you don't lock your doors and they loot your house then you get what you deserve for being complacent. Remove easy access simply by locking the door or installing a security system and the lazy opportunists give up quickly. That only leaves you to contend with the more sophisticated spammers but unlike bandits putting guns to your head, spammers are a usually a little easier to stop especially the greedy ones, without taking a bullet to the head.

Oh well, not my problem with the exception that the idiots running wide open spamware encourage the little fuckers to attempt it on my site.

Friday, January 27, 2006

Attempted Submit Spammer Too Stupid to Spam

Today I was looking for something in a log file just to see what was up and stumbled onto some HUGE repeated strings attempted to be submitted over and over, always different data,

Someone appears to be hellbent to muck up my directory but they went overboard and didn't make sure the submission validated so my form submission pages bounced hundreds of them.

Lucky me!

What a mess that would've been!

Dropped the IP in the "DIE VIOLENT!" database so we don't have to worry about it for a few minutes.

Giving Scrapers a Cookie

Some scrapers appear to actually use the cookies your site transmits just to make sure they don't get stopped in case you block visitors that don't use cookies. In order to turn this to your advantage you can save the original IP address in the first cookie issued. This seems to be snaring a few of them as I'm using that IP in the cookie, if present, to track these idiots so that when they come back under different IPs, assuming they use some proxy or AOL with rotating IPs, that cookie they keep transmitting allows me to link their continuing activity to the original IP address.

So much for the IP shell game you morons.

Thursday, January 26, 2006

When Competitors Lose Their Minds

Sitting here minding my own business and this opt-in spam drops in my Inbox with a plea to buy advertising to generate some revenue for a new project for their web site.

Valley Girl sounds suddenly come out of my mouth:
"Like, Oh My God! Grody! Barf Me Out!"

Then the loud hyena laughter fills the air and dogs start barking blocks away.

Reading further they give the stats on exactly how many ads run maximum and that the placement costing a trivial sum of money is for a YEARLY rotation, not monthly but YEARLY.

Did some quick math and started yelling "YOU STUPID FUCKING MORONS!" as the amount being raised for this "project" is less than my AdSense pays in a few days while these numbnuts are horribly underselling the space and poisoning the well.

Hopefully my advertisers are smart enough to know that you get what you pay for and these wacky pleas for help show just what a weak player they really are.

Almost makes me want to call them and explain the facts of life but I'd hate to make them too smart if you know what I mean.

Idiots.

WebAbuse 2.0 Overnight Stats

Just to give some of you a hint that I'm not exaggerating about the amount of site crawling going on here's a list of 122 different IPs and agents automatically blocked last night that collectively attempted to access many thousands of pages. Some seem to have specific targets and only go after a few pages every time they return but others want to deep crawl the crap out of my site.

When you add it all up the bots are sometimes accessing more pages than the actual site visitors because this list doesn't include authorized bots.

FYI, don't be fooled by what you see just because the user agent looks legit means nothing as a human can't click on and read 200 pages in 150 seconds. It's possible there is an innocent or two that was snared, but considering what's at stake I don't really care anymore.

This is the future, run for cover:

12.221.77.114 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
128.2.220.167 PrivacyFinder/1.1
130.158.81.39 Wget/1.10.1
131.107.0.84 SandCrawler - Compatibility Testing
134.96.1.195 AnswerBus (http://www.answerbus.com/)
137.43.154.203 NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html; nutch-agent@lists.sourceforge.net)
139.18.2.43 findlinks/1.1-a8 (+http://wortschatz.uni-leipzig.de/findlinks/)
142.167.88.250 internal zero-knowledge agent
144.131.251.29 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)
151.24.66.200 Internet Explorer 5.5
162.40.193.253
172.169.142.20 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
172.203.82.76 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
193.165.250.22
193.42.229.3 NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
193.47.80.43 Exabot/2.0
194.167.196.3 Wget/1.10.2 (Red Hat modified)
194.67.3.21 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050920 Firefox/1.0.7
195.101.0.67
195.159.130.14 ZoomSpider - wrensoft.com
195.27.247.70 ColdFusion
195.37.209.45
195.39.234.162
195.70.35.179 KummHttp/1.1 (compatible; KummClient; Linux rulez)
196.209.78.70 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
201.230.91.192 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
201.26.110.67
202.165.102.186 SpiderMan
203.10.224.58
203.113.238.60
203.113.238.60 Random
205.209.169.222 MJ12bot/v1.0.7 (http://majestic12.co.uk/bot.php?+)
206.188.0.11 Jakarta Commons-HttpClient/3.0-rc2
207.148.212.242 PHP/4.1.2
207.171.172.6 Java/1.5.0_04
207.58.161.116
208.185.247.74 PageBitesHyperBot/600 (http://www.pagebites.com/)
209.131.61.1 NutchCVS/0.7 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
209.167.50.22 LinkWalker
209.178.137.175
209.18.119.138 Jakarta Commons-HttpClient/3.0-rc2
209.190.20.194 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
209.237.238.225 ia_archiver
210.17.148.245 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
210.173.180.156 ichiro/2.0 (http://help.goo.ne.jp/door/crawler.html)
211.5.60.108 RSS_READER (mctwist@mail.dr-k.info)
212.117.84.230 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
212.117.84.230 Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8) Gecko/20051111 Firefox/1.5
212.80.76.5 SeznamBot/1.1 (+http://fulltext.seznam.cz/)
213.133.123.154 libwww-perl/5.65
213.156.54.186 Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)
213.176.109.234 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
213.203.184.30 InetURL/1.0
213.42.2.11
216.195.47.98 Snoopy v1.2
216.22.48.28
216.247.238.226 VSE/1.0 (vivisimolog@web121.com)
217.212.224.142 psbot/0.1 (+http://www.picsearch.com/bot.html)
220.210.177.118 RSS_READER (mctwist@mail.dr-k.info)
221.116.237.114 NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
24.11.67.32 Java/1.5.0_06
24.177.134.6 aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)
24.19.240.172 Python-urllib/2.1
24.202.166.142 WebPix 1.0 (www.netwu.com)
24.216.179.135 Zeus 34366 Webster Pro V2.9 Win32
24.22.159.131 FyberSpider
24.242.26.149 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
24.5.187.223 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
24.57.8.78 EasyDL/3.04 http://keywen.com/Encyclopedia/Bot
38.113.234.181 voyager/1.0
58.64.126.5
61.135.131.173 sohu agent
62.163.40.65 Java/1.4.1_04
63.229.208.79 NextGenSearchBot 1 (for information visit http://about.zoominfo.com/PublicSite/NextGenSearchBot.asp)
64.127.124.159 OmniExplorer_Bot/5.85a (+http://www.omni-explorer.com) WorldIndexer
64.141.15.119 Wavefire/0.8-dev (Wavefire; http://www.wavefire.com; info@wavefire.com)
64.148.232.129 brfcaofenxv cdvP3k3xuesucrcxgPp3m
64.148.232.129 fWjnyc p ctcwmbbulcdeqw qew
64.164.63.175 Java/1.5.0_06
64.239.7.218 POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)
64.241.242.18 NutchCVS/0.05 (Nutch; http://www.nutch.org/docs/en/bot.html; nutch-agent@lists.sourceforge.net)
64.38.240.97 Roffle/l.ol(compatible; MSIE 6.0; Windows NT 5.0;
64.40.115.34 Python-urllib/1.16
64.5.245.27 genieBot (http://64.5.245.11/faq/faq.html)
64.94.163.151 Jakarta Commons-HttpClient/3.0
65.19.150.208 OmniExplorer_Bot/5.88 (+http://www.omni-explorer.com) WorldIndexer
65.24.45.49 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
66.117.176.20 Java/1.4.2_04
66.147.154.3 http://www.almaden.ibm.com/cs/crawler [fc14]
66.234.139.194 snap.com beta crawler v0
66.40.35.42 WWW-Mechanize/1.12
67.108.223.130 NextGenSearchBot 1 (for information visit http://about.zoominfo.com/PublicSite/NextGenSearchBot.asp)
68.127.10.143 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
69.0.235.24 Topular/1.0
69.238.36.166
69.41.14.5
70.124.116.68 FavOrg
70.34.224.188 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
70.49.144.182 Visual_Odyssey_Spider/3.0 (http://www.visualodyssey.com)
70.85.193.178 Poirot
71.102.140.247 envolk[ITS]spider/1.6 (+http://www.envolk.com/envolkspider.html)
71.137.197.195 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7
71.213.9.100 Lynx/2.8.5dev.7 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.6b
80.219.233.222 EmailSiphon
80.255.64.42 SIE-CX70/54 UP.Browser/7.0.2.2.d.3(GUI) MMP/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
80.77.86.240 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
81.1.87.163 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts)
81.155.34.158 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
81.19.66.38 StackRambler/2.0 (MSIE incompatible)
81.73.137.226 Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
81.83.46.233 Googlebot/2.1(+http://www.googlebot.com/bot.html) (Googlebot/2.1(+http://www.googlebot.com/bot.html); MSIE; Windows; SV1)
82.120.57.235 Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8) Gecko/20051111 Firefox/1.5
82.131.195.52 LapozzBot/1.4 (+http://robot.lapozz.com)
83.44.42.199 Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
84.148.107.62 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 larbin@unspecified.mail
84.148.108.134 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 larbin@unspecified.mail
84.81.17.28 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
85.101.47.187 Microsoft URL Control - 6.00.8169
85.108.164.241 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
85.125.153.160 larbin_2.6.3 larbin2.6.3@unspecified.mail
87.193.34.166 xyz

BTW, this was a slow night!

Wednesday, January 25, 2006

Zoom Zoom SPLAT!

How many fucking spiders and search engines do we need anyhow?

Apparently one more according to Wrensoft, the makers of leeches from downunder.

Obviously someone found another use for ZoomSpider and aimed it at my web site.

Even if they weren't getting in with that spider name the speed they hit my site would've blocked them anyway so Zoom my ass, you got NOTHING!

What's Microsoft doing?

Something hit my server today called SandCrawler and it appears to be coming from Microsoft.

Agent: SandCrawler - Compatibility Testing
Official Name: tide41.microsoft.com
IP address: 131.107.0.84

Anyone know anything more about this one?

Guess it doesn't matter as crawled into a brick wall.

Net Wooing

Another crawler blatantly disregarding my bandwidth and copyright is NetWu with several products to make a webmaster snarl.

My site was hit with this one:
WebPix 1.0 (www.netwu.com)

Must be ethical to write these things as long as you don't worry about the consequences of their use.

The Bus Stops Here

Another Leech 2.0 technology [some call it Web 2.0] called the AnswerBus got a flat tire when it hit my web site. Don't know what question they were asked but the answer was "HELL NO!". They were even nice enough to put a list of other potential wasteful sites that can be blocked as well.

RSS Reader Got Legs

Don't know what it is but something coming from Japan calling itself RSS_READER is trying to crawl but it can't get in.

It's polite too and checks robots.txt then hits the home page WHAMMO! tries again WHAMMO! checks robots.txt again as it's obviously confused then the home page again WHAMMO! WHAMMO! WHAMMO! banging it's head, tries one more round.

Probably someone's lame attempt to locate RSS files, so sad, too bad.

I'm sure it'll be back tomorrow for more fun and games.

Link Bait Manager

Well if you have to have an almost decent excuse to crawl my website all to hell it might as well be under the guise of being a reciprocal LinksManager which at least has some value.

Unfortunately, my web site is a directory with many thousands of pages and after observing their little bot fruitlessly flounder trying to find that reciprocal link it was obvious nothing good was going to come out of this so I blocked it.

If you really want to check links on my site I'd be more than happy to give you an XML API to do so but you'll crawl hundreds or thousands of pages looking for a single link over my dead body.

Buh bye LinksManager.com_bot, buh bye.

FYI, if you want to see a really nice but incomplete [didn't have this one] list of bots check out the database on Robots.org.

Everyone's a Critic

It's not like a I run around the internet spreading filth on other people's forums as they'd boot my ass off except on threadwatch which is a bit liberal when it comes to the four letter words. Besides, it would be just rude to misbehave on other people's websites, or at least I wouldn't do it using a traceable name and IP, I'm not THAT self-destructive. So this is my version of my "Fortress of Solitude" where I can go off on a tangent on anything I damn well please.

Now the critics are coming out in droves [ok, 2 or 3 critics] questioning my content and how I express myself. Let me say that although I do get some sort of perverse pleasure from those types of disapproving comments that's not what this site is all about as I'm not trying to shock anyone, well not too much anyway, ok maybe I try to cause a mild stroke or an occassional headache but really I mean you no harm.

When I initially started this blog I thought I would just do nothing except write clean little helpful technical articles, maybe even slap AdSense on it, and then the evil side raised it's ugly head. I haven't really seen the evil side in a long time since I ran a BBS back in the 80's, drew cartoons, and wrote all sorts of funny as hell but damning things. It crossed my mind to start up a second blog and let the evil side run rampant in it's own little safe haven and not poison the well of my good intentioned technical posts but that never happened.

Maybe it was procrastination or perhaps the need to finally integrate the good with the evil that overcame me and it was time to resolve my split online persona and I said [bet you can guess this one] "FUCK IT!" and the blog went south and never came back.

You have to understand that I'm a guy and I like to do guy things and have guy type conversations but I'm sitting at home working 24/7 with just my wife to discuss things most of the time. She's actually pretty liberal about most topics, way more so than those girly girls and metrosexual men, but there are just times when I cross that line from what she'll tolerate as a conversation and I get the "Save this one for your friends!" quip. That leaves me in a pent up state worse than a teenager in the back seat of a car with a date that won't put out.

Since this is my Fortress of Solitude there will be times that I might just scratch my balls in public and if I happen to offend someone I'm horribly sorry but the political correctness filter has left the building.

You have been warned and thanks for visiting ;)

Exploration Halted

Don't know who these guys are at Omni-Explorer but it always seemed like a legit group that claims to honor robots.txt and posts their IP range so on and so forth. I always let them crawl since they claimed to be venture backed which made me think something useful would show up eventually and being a Silicon Valley guy I'm always curious about venture backed start-ups. However, it's been YEARS now and the bot keeps crawling with no benefit for me that I can see in the near future.

Sorry guys, but your exploration has ended.

OmniExplorer_Bot is now officially arachnida non grata

Taking your Key away

In the blocked crawler du jour contest we call your attention to our latest entry from Canada called the Keyword Encyclopedia.

Load up all your little .htaccess files and drop "EasyDL" in the list of unwanted party guests.

Ta ta bandwidth waster!

V7ndotcom Elursrebmem Suspended in Gravy

Looks like the AdWords team doesn't have a sense of humor and missed the whole point of my most popular nonsense ad running on a nonsense search term.

Your disapproved ad:

Makes It's Own Gravy
Seriously, what kind of ads did
you expect on gibberish searches?
incredibill.blogspot.com
Ad Status: Suspended - Pending Revision
Ad Issue(s): Unclear/Inaccurate Ad Text

Interesting as I thought the satire was pretty clear but perhaps they couldn't get my web site to "Make It's Own Gravy" or maybe they took offense to being called a "gibberish search"?

FWIW, I guess Google really isn't in the AdWords game just to make money as that ad was the top earner racking up 1/3 of all the money I spent on that idiotic ad campaign!

Oh well, one down, 5 to go.

P.S. For you pundits out there I'm still on the fence about whether that whole ad campaign was my way of lampooning the contest or just a desperate cry for attention, you decide.

Yahoo's SpiderMan

Don't think I've ever seen Spiderman crawling before and APNIC claims it's from a registered block from Yahoo in China.

A lookup revealed:
Official Name: d24.search.cnb.yahoo.com
IP address: 202.165.102.186

APNIC claims it's them:
inetnum: 202.165.96.0 - 202.165.111.255
netname: YAHOO-ASIA-2

So now the big question is what's the advantage of letting an all English web site get crawled by a Chinese version of Yahoo?

Anyone have any insights on this?

Tuesday, January 24, 2006

It's a tangled Web we Copy

They call it WebCopier but it should be more appropriately called WebPirate.

The more these so-called products keep showing up the more I realize that the internet is mutating into something very ugly when copyright and bandwidth of the website owners isn't even a concern of the people building these tools.

It's starting to get depressing.

Harvest Gets Crop Failure

Caught Topular's hand in the cookie jar today.

69.0.235.24 - - [24/Jan/2006] "GET /myarticlesarenotyourarticles.html HTTP/1.1" 200 1302 "-" "Topular/1.0"

According to their website they do "Information Harvesting" without permission of course and no information posted about their robot or any other damn thing, but since it didn't check robots.txt there's no chance it's playing by the rules anyway.

Guess what Topular, your days of harvesting my shit are over.


Picture THIS!

We have yet another new bandwidth leech called PicSearch sucking the lifeblood of the internet coming to us from Sweden.

That's right boys and girls, add "psbot" to your blocked list.

I'm thinking about actually letting them index just one picture of me flipping them the bird with a caption "Pay my bandwidth fees you fuckheads".

Tools for Fools

Here's one to file in the KISS MY ASS column: Website Extractor

You may want to block anything with this user agent:
"Website Quester - www.asona.org"

Product claims to have the following benefit:
Website eXtractor saves you time and effort by downloading entire Internet sites (or the sections you stipulate) to your hard drive.
Let me help with this as I saved your customer even more time and effort as their sorry ass was blocked from downloading a single fucking page.

Assholes.


Not So Fav Icon

OK, when I pull a major blunder I do it right as minor fuck-ups are for amateurs.

Follow along carefully as this will all make complete senselessness eventually...

This problem all started sometime recently as it appears somewhere along the line my favicon.ico got zapped off my server and I never noticed nor did I bother looking at the logs close enough or I would've figured this out a few months ago.

Now imagine that my Apache configuration doesn't know the favicon.ico is an image and on a 404 error was actually displaying a 404 page for the missing icon.

Next, trying to capture more visitors instead of letting them see a 404 page and leave, possibly because of site maintenance errors, the 404 page at some point was redirected to my home page.

Last but not least, I changed my mind a few days ago and put my bot stopper code on the home page after seeing what scrapers could do with just that little amount of content.

Suddenly a small rash of people got banned with about 30 page views in 5 seconds. Looked at what was happening and these people hadn't downloaded 30 page views but had a shitload of requests to favicon.ico which looked very odd. Must be getting dense as it took a couple of days of seeing this before my little brain said "note the favicon.ico requests getting a 404 error".

Let's see what's going on here and try it:
http://www.youdumbasshole.com/favicon.ico

Up pops the home page!

Oh fuck.

So a browser or something asking for the favicon.ico about 20 times in a row loaded 20 404 pages which redirected to 20 home pages which tripped the scraper alarm and stopped them from accessing more pages temporarily.

Oooops!

Uploads favicon.ico, tucks tail between legs, hides quietly in the closet until the massive wave of embarassment passes.

Tales From the Crapped

If you're squeamish about bathroom situations, close your eyes while you read this.

So I'm sitting in the recliner and suddenly get a sharp pain in my side that feels like something large just outstreched vertically in my intestines so I stretched out in the recliner to try to relieve the pain, then it suddenly shifted horizontal causing me to spasm in the other direction. Feeling much like there's something playing X-games in my intestines this happens a few times from left to right, quite reminiscent of atomic diahrea that accompanies the stomach flu.

Great, just what I need to be getting sick.

If I were gay I would've thought it was the baby kicking but I digress.

Now comes the moment of truth, I don't know if I have to fart or shit, never a good sign.

Then without warning or fanfare comes the sudden emergence of the turtle head and the mad dash to the bathroom.

This was no ordinary trip to the bathroom, I had to get a LaMaze coach to help me with my breathing "Now PUSH!" ... "UHHHN"... "PUSH!" so I can only imagine this is somewhat similar to child birth as it feels like I have just opened up so large I could suddenly slide over the toilet bowl.

When the accompanying paperwork is done is when this traumatic trip to the bathroom reaches epic proportions.... never in my life have I seen such a thing, it's a mutant, it's HUGE! it's ENORMOUS! It's the Empire Shit Building standing entirely up the side of the bowl laughing at me as I stare at this forearm sized dung heap in horror.

What the hell did I eat?

Now the moment of truth, time to flush.

[cue theme from Jaws: bum bum bum bum bum bum...]

The water goes up, up, up, up and over the top while this big clinker just sits there mocking me without budging. [digression: Water water everywhere and not a drop to drink] I'm quickly deploying any handy towels as fast as possible so this turdnami doesn't make it to the carpet.

Anyway, you get the idea, a lot of urping and plunging came next.

I need to change my diet as that's some crazy shit that I don't need.

Monday, January 23, 2006

Pressure to Perform

Now I'm starting to get nervous as the pressure is on now that we're up to 5 whole readers and people actually think this blog is funny when it started out semi-serious. My wife is accusing me of using gratuitous cursing just to pander to my main audience (you both know who you are) and claims the blog will end up sillier than Scrubs or worse yet when NBC cancels me like they did Will & Grace.

Not to mention Sebastian is claiming I have tits and suddenly people looking for tranny porn are landing here via MSN which is probably a step up from the usual horse sex crowd. [side note: most of the horse sex requests are coming from the Middle East, camels aren't in vogue anymore?]

Then someone shockingly saw right thru my facade:

When reading his blog one word and one word only comes to mind, curmudgeon.
My wife understands that comment as my online game playing persona is CrankyBaztard which she claims was no accident that I picked it as my name since it was a such a natural fit.

So now I'm all nervous, to curse has become a curse, to not curse is worse.

Ah fuck it.

TIP: Asking your wife if you can "Play Moses and part the Red Sea" once a month does not work.

Too Stupid to Scrape

This one should be filed under "Oh My God What a Moron" as someone slammed my server today attempting to download my content with a minor twist - they got the case on all the page names wrong!

All of my pages use Upper/Lower case file names like Page1_Blah.html

Look at this shit:


0.0.0.0 - - [23/Jan/2006:14:14:48 -0600] "HEAD /page1_blah.html HTTP/1.1" 404 - "-" "-"
0.0.0.0 - - [23/Jan/2006:14:14:48 -0600] "GET /page1_blah.html HTTP/1.1" 404 1302 "-" "-"
0.0.0.0 - - [23/Jan/2006:14:14:53 -0600] "HEAD /page2_blah.html HTTP/1.1" 404 - "-" "-"
0.0.0.0 - - [23/Jan/2006:14:14:54 -0600] "GET /page2_blah.html HTTP/1.1" 404 1302 "-" "-"

The page names were all correct but lower case which sure as hell won't work on a Linux server!

Never thought I'd see a scraper too stupid to scrape!

That's one dumb asshole.

Poor Kitty, Too Funny

Just about hurt myself ROTFLMAO when I saw this entry in my webstats as some poor distressed pet owner searched MSN for "if cat barfs is there something wrong" and landed on Cat Tales of Horror as the #1 result.

Tears are still running down my face, that poor person, I almost feel bad for them.

Ya know what?

Fuck it.

They should get a dog if cat barf worries them so easily as they'll soon not have a single spot anywhere in the house that isn't somewhat slightly stained by brightly colored cat vomit.

TIP: When decorating color coordinate with your brand of dry cat food.

Bot Busting Could Have Significant Savings

The total reduction of 10's of gigabytes of bandwidth on my server alone due to bot stopping makes it easy to imagine that having such bot busting technology installed on every server in a datacenter could be an enormous savings in bandwidth second only to stopping spam. This technology could result in a small windfall for small hosting companies constantly being squeezed by service providers for more money by allowing even more customers on the same bandwidth currently being stolen without permission. Conceptually, blocking these leeches from an entire network could result in all sorts of additional savings.

The need to upgrade motherboards, especially for busy shared servers, could be significantly lessened. The older motherboards currently straining under the load, similar to how my newer dual Xeon was, would suddenly be more than adequate to continue to grow a business without any additional equipment expenditure. Being able to get a little more juice out of older equipment could allow datacenters to spend more on infrastructure instead of further lining the pockets of service providers.

Now the problem is how do you sell a product to hosting companies that could actually impact their revenues by cutting bandwidth usage which results in additional charges?

Simple.

Offer the bot busting technology as an additional paid service labelled as a content and copyright control technology that reduces the ability of scrapers and aggregators from using their content without permission.

Seems like a natural for a control panel plug-in for Plesk, CPanel, etc.

More revelations coming soon.

I'll show you PRIVACY...

Another useless excuse for a website called PrivacyFinder crawls a few of your pages looking for your p3p privacy policy and combines that oh-so-useful information with Google and Yahoo search results.

They just got a taste of MY privacy policy today as their bot couldn't get past my front door.

If this little bit of information was useful to include in the SERPs and customers demanded to see it then Yahoo and Google could just include it in the first place.

Go find your privacy elsewhere and keep your ass off my server.

Stupid.

Link Your Ass to My Foot!

More link exchange bullshit as this excerpt was one of the best lately:

Dear Dipshit,

We are pleased to inform you that your website http://www.youreanawesomewebgod.com has been listed in our site http://www.stupidfuckingpondscum.org

You can find your link at: http://www.whogivesaflyingfuck.org/index.php

Links exchange means that you need to put reciprocal link to us.

Preferrable on this page http://www.notonmywebsitenotonyourlife.com

Our robot have checked this page for 5 times but our link wasn't found.

Threats about not linking to us and losing your link blah blah I think I shit my pants etc.
Guess what?

Your robot can use that phillips head screwdriver attachment and thoroughly fuck itself.

What kind of morons are harassing me with this nonsense?

I'll link to your web site about the same time I become a rock star and my balls start slapping Pamela Anderson's ass after a concert which is NEVER!!!

Now go away, eat shit and die, leave me alone.

Sunday, January 22, 2006

Block this Server Side Browser

Here's a real winner that must have a page of links somewhere that spiders crawl via their proxy server as it was looking like an attempted page hijacking in Google when it was discovered.

These slimeballs download your site via the proxy server, strip your javascript so the frame busters don't work, and slaps their ads for pecker pills on the top of the page.

Ran into a similar site from China last year but they were embedding AdSense into the page and Google took care of them in short order.

Currently you can block them both via IP and the referrer as their proxy isn't terribly clever yet and leaves their domain name in the referrer string.

Some SEOs are as DUMB as Pet Rocks

Which is an insult to Pet Rocks.

Some of you know that one of my main websites is a niche directory that has been online since before the DOT COM boom and Yahoo was still mostly used as an adjective describing people living in the south.

Anyway, some genius SEO/Web Designer/Pinhead submitted a listing for a client to my directory today and did the following:

  • Put a bunch of keywords in the title, yes, nothing but keywords
  • Put a bunch of keywords in the description, if you can call it that
  • Put in HIS email address wrong with his company name butchered all to hell
  • Pissed me off
I swear to god my cat can hack up hairballs with higher IQs!

Come on, it's a DIRECTORY, not a SEARCH ENGINE so pull your SEO head out of your ass and act accordingly!

Is the word TITLE too complicated?

Is a simple DESCRIPTION of what's provided at the website mind blowing?

Couldn't you just cut and paste your email address without me having to mop up after you?

Showing that my IQ is above room temperature, unlike that SEO, I was able to figure out who this guy was in 2 seconds by pasting the butchered domain name on the email address into Google which ran it's "people are stupid filter" and it figured out what it was supposed to be and landed me right on the guys web site.

Quick check with WHOIS.SC popped up the same name as owning that domain that submitted the site and the IP address was geographically similar enough that I was sure it was the same person.

How do people get work for themselves when they can't even get their own email address right?

More importantly, who in the hell even types in an email address anymore with all these auto-form filling tools built right into the browser?

What a dweeb-assed shithead, I need a drink...

Copyright Crawlers? Get the gun!

The boatload of irrational 404 errors showing up in my web stats suddenly makes a lot of sense as my spider trap snared some asshole today bombarding my server with a shitload of requests for pages that don't exist.

Well guess what?

It appears the crawler was looking for any stolen copyrighted pages and abusing my server in the process.

Who gives these copyright protection services and tools the right to fucking attack my server requesting 100s of pages a minute and suck up my bandwidth dumping 404 pages?

Now that you can't scan my site maybe I'll just locate some of the pages you were trying to find, steal the damn things, and put a big FUCK YOU on the top of each of those pages.

Not like you'll ever see it but everyone else can laugh their asses off.

Fuckers.