Saturday, January 14, 2006

Cat Tales of Horror

When you first get your new cat you think about all the fun things like playing with the cat dancer, petting the warm fuzzy critter sitting on your lap purring, and all that pet lover crap.

What you get in reality isn't just a pet, it's a four-legged festival of bodily malfunctions designed to horrify, shock and destroy everything you own.

Don't get me wrong, I love my cat, but if you could install a zipper on his mouth and a cork in his ass life would be easier.


Strong ID

That's what it said on his papers when we bought the little fucker:
STRONG ID

We had no clue what that meant, he responded well to his name?

A few days later we figured it out.

Our cat's shit was radioactive and smelled so bad it could send people running out of the room.

One day, we're sitting in the living room over 50 feet away from the litter box and this god awful smell comes wafting down the house as people run screaming and urping in terror and gather under the kitchen ceiling fan for protection.

Can someone explain to me how a very tiny kitten dropping a very tiny tootsie roll can make it smell like the fucking sewer just exploded and overflowed into my house?

Feeling much like Indiana Jones in the Nostrils of Doom I pull my shirt up over my nose and yell "I'M GOING IN!" and run into the bathroom, scoop that one tiny fragment from the litter box and flush it down the toilet and diffuse the situation.

Gives me some respect for people that work in bomb squads right about then.

Luckily, this odiferous situation goes away when we get the little bastard neutered.


Barfy the Vomit Slayer

Twelve years later (ok, 2 days ago) my wife is running thru the house holding the cat yelling "NO NO NO NO!" heading for the bathroom after scooping up the cat from the furniture as he sat there in full blown hairball hacking mode with with his tongue hanging making that oh too familiar sound:

WHUCK! WHUCK! WHUCK!

Luckily, once again we get him into the bathroom just in time to blow up on the tile instead of the sofa, bed, carpet, etc. Just like the postman that always rings twice, the cat always barfs twice so you have to close the bathroom door until he's done with his yakfest.


That Sinking Feeling

First, let's start with the fact that my cat refuses to drink water from a bowl and stands in the bathroom sink and screams for someone to come turn on the water. It's like having a 2 year old always asking for a drink. At least the cat is nice enough to yell when he's done so you can go turn the water off, a polite pet is always a bonus.

One day a few years back, after turning on the water he just keeps going back to the sink, over and over, for maybe thirty minutes. I'm wondering what in the hell is up, too much salt in the cat food?

Suddenly, my daughter starts screaming and yelling from her room and I run in to see the cat about 7 feet up in the air, on top of the TV which is on top of the armoir, spewing buckets of water cascading down the armoir like one of those fountains in the park. Trying to get the cat off the TV makes matters worse as he goes the opposite way spewing water as he runs. Eventually he jumps off on his own running in a typical cat panic to hide under the nearest bed after the waterfalls have stopped.


Who needs an alarm?

One morning I was sound asleep and woke up to some odd noise in the room:

WHUH WHUH WHUH WHUH

I look around the room and much to my horror I see the cat sitting atop the armoir catapulting balls of cat food across the room. His head turns back and forth and "WHUH!" another ball of brightly colored cat food shoots about 3 feet onto the carpet.

Nothing I'm going to do about this until it's over as I'm not taking a cat bomb in the face trying to get him down - the horror of it all.


Follow the Bouncing Ball

The cat comes running out of the bathroom one afternoon with something bouncing along behind him and he just keeps running around from room to room with this little thing bouncing around behind him.

I know this is going to be bad and I should probably turn and run in the other direction but instead I catch the cat to see what in the hell is going on.

It appears kitty has somehow consumed a few of my wife's hairs and now has a bouncing ball of shit chasing him attached to 3 hairs still hanging out of his ass.

Just lovely, you know someone has to pull these hairs out of his ass, and you know who that someone is going to be.


Carpet Crawler

Ever see a dog or a cat sitting ass down on the carpet pulling themselves along by their front feet?

That's what my cat does when he hasn't been drinking enough and his shit gets stuck half way out his ass - runs out of the litter box and scoots down the hall until suddenly a little tootsie roll pops out on the carpet.

Have I mentioned I love my cat?

Computer Monitor

The cat is always on top of my computer monitor, as a matter of fact he's there snoozing right now as I'm typing this. Good thing the little fucker can't read or I'd probably lose a layer of epidermis for this rant.

Anyway, one day he comes flying out of the bathroom and jumps on the desk and monitor with a vengeance and I'm startled even more when he suddenly goes ass down on the top of the monitor vents and starts going in circles.

Fucking lovely - BOING! Tootsie roll.

Fine, now it's time to get the disinfectant and the paper towels to clean off the brown streaks on top of the monitor.

Did I mention I REALLY REALLY LOOOOVE my cat?


It's 3AM, do you know what that smell is?


My wife wakes up at 3AM yelling about the horrific smell in the room, then she notices the smell is coming from the cat laying next to her in bed. Then she notices some stinky smelly goo all over the bed near his tail.

Did you know cats have skunk like glands in their ass that need periodic cleaning?

Neither did we until it just unloaded in our bed!

Nothing more fun than changing sheets and deodorizing a room at 3AM


What's Cooking?


One fine Saturday we're having lunch in the other room and suddenly a smell starts wafting across the house so I ask "Honey, you leave something on a burner?" and she replies "Nope, and it's not coming from the kitchen either".

Suddenly the cat looking scared shitless bolts into the living room - never a good sign.

So I start walking around the house to see where the odor is coming from and as I get closer to my computer room the smell of burning circuit boards fills the room.

One look at the monitor and I see the problem as smoke billows out the top of the vents which are covered in cat spew and my HP M90 is officially toast.

ISN'T LOVE FUCKING GRAND?

OK, maybe love is only half-grand as that monitor cost $500 when it was new.

Fucking cat.

Friday, January 13, 2006

Incremental Genius and Scraping Epiphany

Some times ideas strike and you just go EUREKA! when you know you've hit a home run.

Earlier I posted about sending trash data back to bad bots instead of just blocking them and that idea turned into a genius brainstorm.

SEND THEM ADSENSE BLOCKING WORDS AND STUFF KEYWORDS!

Sending scrapers piles and piles of nasty crap as a result of their unwanted crawl of your site could disable AdSense from their page, get them banned from AdSense, Google or BOTH.

How about a nice page of this:

Don't scrape my site or I'll get even with guns guns guns bazookas firearms guns guns guns nude erotic suicidal dead corpse etc. repeated over and over.

NOW we're going to have a little fun.

Some days a nice ephiphany just puts a little bounce in your step doesn't it?

Forbidden or Trashed Response

From what comments are posted on various forums the most common thinking is telling bad bots they've been forbidden from the server with a 403 error to make them go away.

However, if the scraper already has your content from previous scrapes the best method may be giving the bots placebo data so the idiots will just let it run until they've trashed all previous copies of your content.

Gonna give that a whirl for a few weeks and see what happens, should have some angry scrapers soon ;)

Gotta Stop Blogging

Wasn't paying attention today and my inbox was overflowing with people trying to give me money to advertise on my sites.

Nobody appreciates this shit anyway and it certainly doesn't pay the bills!

Who am I kidding, I'll have 4 posts before you know it...

Thursday, January 12, 2006

Scraper Thoughts

Do lazy scrapers just download other scrapers sites?

Q: Why doesn't Google ban scrapers?
A: Professional courtesy

Q: How many scrapers does it take to change a lightbulb?
A: None, they'll just download your lightbulb

Q: What do you call 50 scrapers going over a cliff in a bus?
A: A good start

Q: How do you kill a scrapers?
A: Slam the lid on their head when they try to take a drink

If you like this stupid shit let me know and we'll post more.

If you don't like this stupid shit, tough shit, write something better

AdSense Running Low or Slow?

Something I've been noticing the last few days is my AdSense ad units seem to either be low on inventory or slow as they are inserting a lot more PSAs than usual.

When I noticed this I reloaded the page a few times to see what was going on and it's hard for me to fathom that reloading a page with 3 ad units suddenly has 2, then 3, then 1, then 2, then none, then 3 and so on and so forth.

Would either indicate there is a lack of ads available or perhaps AdSense just can't connect to the ad server fast enough and they drop a PSA when it panics during a time delay.

Brings up many questions:

  • Could AdWords be running low on inventory?
  • Are advertisers just broke after Xmas?
  • Perhaps a new experiment in showing less ads for more revenue?
  • Maybe the network is just overloaded?
  • Could YPN be stealing that many advertisers?
Whatever the reason, it doesn't seem to be impacting my revenues so far.

Yahoo Labs?

Never seen this Yahoo Address before, wonder what they were doing?

Name: rlx-2-4-10.labs.corp.yahoo.com
IP address: 66.228.182.210
OrgName: Overture Services (formerly Goto.com)
NetRange: 66.228.160.0 - 66.228.191.255

Accessed just a single page in the middle of my web site somewhere, very very odd.

So I expanded my log search based on the NetRange:

66.228.182.203
66.228.165.141
66.228.165.140
66.228.182.202
66.228.182.210

Just a handful of pages, very odd indeed, but one of the direct hits attempted to load an AdSense PSA alternate ad so was it a human or a new bot that executes Javascript?

Handful of hits, didn't load any graphics whatsoever or things a normal browser would do, and hit one page twice.

Very curious

Bot Busting Primer vs Security Concerns

Now that I'm pretty sure my bot busting techniques are working like a charm the big dilemma is at hand.

A. - Do I completely disclose all the bot busting techniques so that others can bust these scraping assholes too?

or

B. - Do I keep some of the secrets to myself so the scrapers can't adapt and appear invisible to the naked algorithm?

It's a real catch-22 in that disclosing what appears to be solid scraper stopping techniques could unwittingly let me get scraped all over again. Most likely my web site with 40K pages would still be safe from a complete scrape as you just can't hide that kind of activity but more subtle "update" scrapes just culling the most recent content additions would be easier to slide under the radar.

What to do, what to do...

At the moment, I think I'll do nothing except document it for my own purposes.

What happens after that is anyone's guess.

SNAP.COM? You Must Be Joking!

Wow, haven't heard of Snap in ages and saw some silly little bot claiming to be a beta crawler from Snap so my curiosity was riled up and I pulled up their web site. If you haven't been there lately you need to go just to see it's kinda cool but so slow you'll get a giggle.

The search results has a preview window that shows a snapshot of the web site and what's odd is one of my web site previews shows fine in Internet Explorer but in Firefox gets a generic screen with the message "The website http://www.snapsucksbigtime.com may not preview properly".

What the fuck?

They also frame people's sites which my javascript doesn't permit, maybe that's the issue.

Don't know, but why they can show my preview in one browser and not the other yet everyone else seems to work in both is just bizarre.

Free Anti-Virus and then some

What was shocking to me wasn't that the Google Pack contained mostly free rehashed stuff, it was the 6-month Norton Anti-Virus trial that raised my eyebrows. They must get some kick back from Norton to have that placement in the Pack otherwise why wouldn't they just include the free home version of AVG anti-virus?

Been using AVG to protect my laptop for a long time and it seems to work just fine.

Heck, where was the Open Office Bundle or Putty the free SSH client?

Now if someone just could build a free version of PhotoShop then there would be no reason to pay corporate America for software ever again.

Wednesday, January 11, 2006

We Will Block You

Sung to the tune of We Will Rock You with apologies to Queen

Scraper with a bot made a big web crawl
Scrapeins’ from my site be on Google for all
You got snips on yo’ site
You big web plight
Blockin’ your bot all thru the nite

We will we will BLOCK YOU!
We will we will BLOCK YOU!

Scraper you’re a pond scum low life ass
Postin’ in the forums gonna kick my ass
You got my stolen content
You better repent
Clickin’ your ads til the money is spent

We will we will BLOCK YOU!
We will we will BLOCK YOU!

Scraper you’re an moron not too bright
Bleedin’ from your eyes would just make my night
You got my crap on your site
You big web plight
Maybe AdSense will ban account tonight

We will we will BLOCK YOU!
We will we will BLOCK YOU!

NOFOLLOW Straight To Jail, Do Not Pass Go

The next trick up my sleeve in the war against scrapers and badly behaved bots is to cloak NOFOLLOW tags into the pages when it's not a major search engine requesting the page..

Basically the concept is by changing the home page, site map and site wide navigation links to contain NOFOLLOW on all major links it will stop legitimate bots at the front door and they will be allowed to crawl a few pages by default. However, any crawler that follows links flagged with NOFOLLOW will get the hammer dropped on their ass.

One added touch is there will be a few spider-trap pages that always have NOFOLLOW on them, never allowed to be crawled by any bot, just like the current robots.txt file has some traps in it as well.

Building a better spider trap, how far will this obsession go?

Hysterical User Agent

Someone that obviously doesn't know their ass from a hole in the ground tested my bot trap with a user agent name of "MSIE" claiming to be coming from "www.av.com", or AltaVista if you didn't know, and I just about fell off my chair laughing.

Sure don't know what they're giving customers to smoke over on Cox cable but I sure wish Comcast would get some of it and share.

Word and Internet Explorer are Loathesome Beasts

Wrote an article in MS Word, posted it to Blogger, looked just perfect in Firefox.

For some reason today I took a look in Internet Explorer and there is all sorts of shit in the page like this you don't see in Firefox:

<!--[if !supportEmptyParas]-->

<!--[endif]-->

Now I'll go spend time stripping some garbage out of the HTML thanks to you stupid MS jackoffs.

Microsoft just take your bullshit HTML conversion from Word and your stupid Internet Explorer and hunker down in a corner somewhere and go fuck yourself.

WHAT DO YOU WANT?

I'm perplexed as this blog has a duality of users thanks to MSN thinking we're a porn site and the rest of the world thinking I'm a techie dude, so I have no clue what my readers want to read.

Do you come for the latest research data I'm posting or do you come for the full tilt rants that I tend to make when I'm pissed?

So much to share, so little time, so let me know if you want more technical garbage or prefer it when I get all riled up and step off the deep end and make a total jackass out of myself as there's only so much blogging to go around.

Tuesday, January 10, 2006

Maximizing Advertising Exposure

Many web sites make their money strictly on third party advertising programs, which are becoming more of a challenge as various disruptive technologies are thwarting webmasters from capitalizing on their content. To get advertisements in front of all visitors you need to deploy a battery of techniques that can maximize the likelihood of advertisements being seen by everyone including selling direct advertising.

Typical advertising methods and technologies

Various advertising models exist to meet the needs of both merchants and publishers of varying technical ability.

  • Pay Per Click (PPC) ads such as AdSense or Yahoo Publisher Network which rely on javascript to function
  • Cost Per Impression (CPM) ads typically served by 3rd party networks, such as FastClick, showing as banner or text ads rely somewhat on javascript but may work without.
  • Affiliate Programs, such as Amazon or Commission Junction, are typically just 3rd party banner servers but rely on cookies to track sales for the website.
  • Direct Advertising consisting of sponsored links, banners, and text ads being served by 3rd parties like AdBrite, hosted on third party services or locally on the web server.

Weaknesses of advertising technology

Each type of advertising has fundamental flaws in that it only works well assuming that the visitor to the web site has that technology enabled and ads are blindly inserted into web pages. This blind technique may work somewhat for the majority of the website’s visitors but the odds are that a significant amount of ad technologies may be failing.

  • Blocked cookies render affiliate programs useless
  • Disabled javascript stops PPC programs from working
  • Blocked 3rd party images stop banners from displaying although the links are still present
  • Ad blocking technologies like Norton Firewall typically disable all 3rd party and local CGI technology ad servers.
  • Contextual ads may display Public Service Ads (PSAs) due to stop words in your text, 3rd parties framing your site or a visitor using an anonymous proxy server which alters the page name originally indexed.

Simple Method to Maximize Ad Exposure

Considering many visitors to your website may only see a single web page then your best bet of getting any advertising exposure whatsoever is to use a “buffet” approach with a little taste of all types of ads on a single page. What this combination of ads does is improve the odds that your web page will display at least one ad since each type of advertising relies on different technology. This is not to condone putting a ton of ads on your web page to the point it looks spammy so design the ads into the site intelligently.

  • Include a PPC ad which assume javascript is enabled
  • Include an affiliate banner or box ad which assumes 3rd party ads and images aren’t blocked and cookies are enabled
  • Include affiliate text links which won’t track impressions if 3rd party ads are blocked but will track sales if cookies are enabled
  • Include a CPM banner ad if you want, assumes 3rd party ads aren’t blocked.
  • Embed direct advertising or banners such as a sponsored link, and these should be directly included in the page using SSI (server side includes), PHP or some other technology that cannot be detected and disabled in the visitors browser.

The best way to describe this technique is “slinging mud at the wall and pray that something sticks”.

Simple Fallback Ad Strategy

Any advertising method that uses javascript that doesn’t automatically supply a <noscript> alternative, such as AdSense, should have one added so ads display when javascript is disabled and it doesn’t violate your AdSense terms of service either.

Example:

<script>
… AdSense Code…
</script>
<noscript>
… insert non-javascript banner or text ads here, affiliate program, etc.
</noscript>

Alternative Ads for PSAs

To further maximize your advertising exposure you need to make sure that all contextual advertising, or any other that provides this option, have alternative ads defined so that the ad space is being fully monetized and not displaying free Public Service Ads (PSAs).

Implementation will vary depending on which service you’re using, and may even be able to be daisy chained such as AdSense loading an alternate ad that also can’t be displayed thus loading another alternate ad, and so on and so forth. Recommendations are that alternate ad chains shouldn’t be longer than 3-4 advertising networks in depth otherwise the time to display the ad could take too long and the visitor will be gone from the page or have scrolled and missed it entirely.

Dynamic Adaptive Advertising

Truly maximizing the revenues on a website requires some server side programming that can detect disabled javascript, disabled cookies and banner blocking. How this works is that a series of probes is inserted into the first page displayed to a visitor and the results of this probing cause the appropriate adjustments to the advertising being served in subsequent web pages.

Cookie Detection

Setting a cookie can be done a couple of ways such as sending the cookie from the server and setting a cookie in javascript which will indicate to the server if both cookies and javascript are enabled in one shot. When cookies are disabled any form of affiliate tracking won’t work and should be avoided in subsequent web pages.

Javascript Detection

Another method of detecting javascript being disabled is to embed an image being served and tracked by a PHP script in the NOSCRIPT tag of a javascript probe. When javascript is disabled any form of PPC advertising (AdSense) should not be in subsequent web pages.

Example:

<script>
… Set a cookie here to prove javascript is enabled
</script>
<noscript>
… Load a dynamic tracked image to verify javascript is disabled
</noscript>

Banner Blocking Detection

Something as simple as a cgi-bin based page counter can be used to detect banner blocking software as Norton’s Firewall tends to block images being served by a cgi-bin application as it’s assumed to be an ad server. When banner blocking is detected most 3rd party advertising networks and some local ad servers won’t work and should be avoided in subsequent web pages. However, if cookies are enabled affiliate text links will work but impression tracking for these links won’t function.

Dynamic Advertising Matrix


Affiliate Ads


CPC Ads

CPM Ads

Embedded Direct Ads

Cookies Disabled

NO

YES

YES

YES

Javascript Disabled

YES

NO

YES

YES

Banner Blocking

NO

MAYBE

NO

YES

Summary

By using the various techniques outlined above anyone from a novice webmaster to an experienced web professional should be able to maximize their revenue potential.

Looking forward to your success stories!

Monday, January 09, 2006

Free Internet Could Slay Telephony

Everywhere you look people are talking about free wifi and free internet as if this commodity pays for itself. This must be leftover thinking from the dot bomb days where everything is free but you make it up on volume.

What these naive sacks of shit don't realize is that services such as Skype, or worse yet something FREE like VoIP on Yahoo Messenger, will also run on any free internet infrastructure. Unless these free internet connections are throttled to speeds to slow for VoIP these well intentioned public works projects could end up undermining the very backbone of the telephony companies we depend on for a significant amount of the internet.

Can you imagine that telcos could lose so much money that landlines, DSL and internet connection fees could skyrocket for the rest of us not in a free internet area or the companies simply collapse and go bust leaving the internet in shambles?

Tread lightly with all this hippy inspired communal free shit as your free internet could ultimately cost way more than you ever imagined.

Bad Bots Bad Bots, What Ya Gonna Do?

Looks like more of them hiding now trying to slither thru my site under the radar and it's just not working because my new visitor control panel, aka radar detector (hehe), let's me see this activity at a glance without breaking a sweat.

Today's idiot bot-du-jour was much like yesterday's, a more well behaved variety of slow crawling scraper masked as a human, but stepped in the spider trap and >SNAP!< lost an IP address.

What I'm starting to think is this is just too much fuss and a better idea might be to just break all the navigation for spiders by converting everything to javascript navigation and then supplying authorized robots the normal version of navigation in <noscript> tags.

The only issue that needs to be resolved is whether or not the search engines would deem this cloaking as the content both the search engine and the end user sees would be identical, only the technology of the navigation would change based on who requested the page.

The other idea I'm tossing about is to simply insert a captcha randomly after so many pages views so that the bot would just be stopped dead in it's tracks opposed to a human that would type in the text and continue on their path. Humans rarely ever get into hundreds of page views and interjecting captcha's after about every 40 pages over and over would pretty much bring bots to a screaming halt.

Of course there are blow-thru captcha tricks where scrapers ask humans on other sites to enter the captcha data needed to get past these traps, but setting a series of random traps with random code on random page names might just make it too hard for the scrapers to accurately identify a captcha and they'll simply download hundreds of captcha pages instead of content.

More crazy ideas coming as this episode of web warfare evolves.

Sunday, January 08, 2006

Bad Bots Trying Harder

Today some stupid bot came slow crawling like a human and masked itself as Internet Explorer but left telltale signs of a robot.

You ready for the best tell-tale?

Shockingly it looked at my robots.txt file as the first file of my site, probably trying to avoid my spider-traps.

Talk about stupid, you scrapers need a life as looking at robots.txt *IS* a spider-trap in itself!

MUAHAHAHA!

Ooops, let a secret out ;)

AdSense PSAs Per Browser?

Well Google AdSense is really pissing me off today as my Firefox browser is showing PSA ads on my own websites home page yet the same page on the same computer is showing ads in Internet Explorer.

Other pages show ads in Firefox, just not my own home page.

Some days this bullshit really gets old and at this point I don't even care why I'm seeing PSAs on certain pages in just one browser as I'm thinking I should be seeing Yahoo ads there instead.