Saturday, March 04, 2006

Competition claims WE'RE NUMBER TWO!

Just about chuckled my ass off when I was snooping some of my competitors sites, not the blogs silly, my money making site, the site that keeps me in expensive booze and wide screen TVs, not this bullshit.

Back to the story as I'm DigressJacking™ it already.

Anyway, this site we've discussed before as they undersell ad prices, send out newsletters begging for advertisers to fund new projects, etc. but now it gets even more amusing. Somehow this guy has mysteriously moved up the ranks in Alexa which is now listing him as #2 in his category right below me, oh whoop de doo, I've been ranked #1 in that slot in Alexa since they first set up shop, so you bumped someone to get to #2, big fucking deal. Funny he doesn't mention anything about Google's Directory by Rank that has me still at #1 for years and he's way down the list over there, not a peep.

Then he calls out one other place where he's #2 below me for some bullshit meaningless 2 keyword search term in Google that doesn't even drive traffic.

Excuse me?

Is that smoke I feel blowing up my ass?

Hate to burst your bubble buddy boy but I'm in the top 10 for 2 keywords that are tops in the field, ONE WORD, NOT TWO, yes a SINGLE KEYWORD that drives thousands of visitors daily and it's not a BULLSHIT TERM, people actually actively fucking search this term!

So then he goes on lamenting about how his ads are a better value at his rock bottom bargain basement prices than "some other more expensive sites".

Give me a fucking break, I send people more traffic from my ads in a day than you sometimes send in a month. He knows it's true too because he used to have a page showing the traffic for all his ads and took it down, good thing too, it was embarassing.

I've been toying with making any mention to his references but so far all I did was put up a nice bold line of text on my ad page that says "We may cost more but you get what you pay for - performance" and left it at that.

I'm thinking at this point the best way to deal with this putz is just ignore him and let him dig a deeper hole until Google determines he's supplemental like they just did to an even lesser competitor and relegate them to the search engine dungheap.

Wrong Number, Hang Up Already

Well, someone with their Nokia phone just didn't take WRONG NUMBER for an answer and tried to beat the shit outta my server looking for a place that would take their call. Nokia6600/1.0 (4.09.1) SymbianOS/7.0s Series60/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.0

Actually, it looks like Yahoo might've done this on their behalf as the IP address resolves to so it might not have been the phone that was so insistent.

03/04/2006 05:30:53 "/"
03/04/2006 05:30:53 "/mob"
03/04/2006 05:30:53 "/index.wml"
03/04/2006 05:30:53 "/index.xhtml"
03/04/2006 05:30:54 "/default.wml"
03/04/2006 05:30:54 "/default.xhtml"
03/04/2006 05:30:54 "/home.wml"
03/04/2006 05:30:54 "/home.xhtml"
03/04/2006 05:30:55 "/mobile"
03/04/2006 05:30:55 "/mobile/index.wml"
03/04/2006 05:30:55 "/mobile/index.xhtml"
03/04/2006 05:30:55 "/mobile/default.wml"
03/04/2006 05:30:55 "/mobile/default.xhtml"
03/04/2006 05:30:56 "/mobile/home.wml"
03/04/2006 05:30:56 "/mobile/home.xhtml"
03/04/2006 05:30:56 "/mob"
03/04/2006 05:30:56 "/mob/index.wml"
03/04/2006 05:30:56 "/mob/index.xhtml"
03/04/2006 05:30:57 "/mob/default.wml"
03/04/2006 05:30:57 "/mob/default.xhtml"
03/04/2006 05:30:57 "/mob/home.wml"
03/04/2006 05:30:57 "/mob/home.xhtml"
03/04/2006 05:30:57 "/wml/index.wml"
03/04/2006 05:30:58 "/wml/default.wml"
03/04/2006 05:30:58 "/wml/home.wml"
03/04/2006 05:30:58 "/xhtml/index.xhtml"
03/04/2006 05:30:58 "/xhtml/default.xhtml"
03/04/2006 05:30:58 "/xhtml/home.xhtml"
03/04/2006 05:30:59 "/wap/index.wml"
03/04/2006 05:30:59 "/wap/index.xhtml"
03/04/2006 05:30:59 "/wap/default.wml"
03/04/2006 05:30:59 "/wap/default.xhtml"
03/04/2006 05:31:00 "/wap/home.wml"
03/04/2006 05:31:00 "/wap/home.xhtml"
Sorry, if you'ld like to dial my website, tough shit.

Keep your hands on the fucking steering wheel.

gnootBot still going and going....

As reported the other day we seem to be the first ones reporting on gnootBot and it has never downloaded a single real page from our site but seems to somehow knows every page that's on my server and keeps slowly asking for pages day after day.

Wonder where the pages names from?

Possibly crawled me before installing the bot blocker or downloaded a list of pages from a search engine or some shit.

Doesn't matter as it's still going and there's obviously nobody at the wheel as they are getting nothing but error messages.

Hope it's worth your time when it's over asshole.

FIRST SIGHTING: Sproose Goose got Plucked

Yet another Silicon Valley startup search engine called Sproose came crawling this morning tagged as sproose/0.1-alpha using Nutch. Well, in their site it claims they have seed funding from VC's, also reported elsewhere, but you can't do any searches yet as they are currently building their Knowledge Rank™ which sure sounds a lot like PageRank, huh?

The first knowledge they got when they hit my site was that they didn't rank high enough to crawl my content and got the door automatically slammed in their faces by being an unauthorized bot. Sorry boys, robots.txt is so 90's, we use razor wire around the compound to keep people out these days.

You may have seed capitol, but I require being wined and dined before you may crawl my 40K pages, or just email mail me with a PLEASE as this entitlement mentality to crawl every site and run up our costs online just because you have been funded is BULLSHIT.

BTW, the people that wrote NUTCH should be hauled out in front of a firing squad and shot as I'm seeing more and more crawling from their little engine that couldn't constantly bouncing off my site.

Friday, March 03, 2006

POP QUIZ: Your Site Already Has a Spider Trap?

Most of you web site owners already have a spider trap on your web site and you don't even know it. There are about 3 pages that humans almost NEVER READ and spiders gobble up daily so all you have to know is which pages these are and then grep for them in your access logs and VOILA! you see a list of mostly spiders hitting your web site that can be blocked at will.

Once you get a list of who's been looking at youre spider trap pages, simply take each IP in the list and then grep for all activity for that in your access log. When you see hits to all pages and no images loaded it's a clincher you got a spider but just dont get carried away and block Google/MSN/Yahoo.

Even if an entry in your access log says Googlebot as the user agent it may not be Google, so check out where the IP resolves and make sure it's in the domain with a reverse DNS lookup, which you can do on DNS Stuff if you don't have other tools available.

Now, anyone want to guess which 3 pages or files on a web site are spider traps?

I know I give you all a lot of information but anyone should be able to figure this out by staring at any typical web site and see which links you would never click.

If nobody can figure it out MAYBE I'll tell you on Monday, if I'm in the mood and if I remember.

Come on people, POP QUIZ! post your guesses, don't be shy!

Wednesday, March 01, 2006


Well here's a new combination that really pissed me off with a scraper and referer spammer all rolled into one. Saw this spunk monkey crawling my site, the bot blocker was already stopping his ass, but something caught my attention in that every page crawled had the same referer as the origin. Sure enough, when I went to look at the site attempting to get notoreity from my access log it was a new directory web site that was using scrapings to get attention to the site for people submitting listings.

Holy shit, you mean to tell me people are stupid enough to click "submit url" when they can plainly see all the listings are junk that don't even have valid links out?

Apparently they are that stupid as the bottom of each so-called directory page appears to be actual submitted listings opposed to the scraped crap content without outbound links at the top trying to snare search engine love.

Now I'm steamed, this is NOT a pretty turn of events.

New AdSense Setup Wizard!

Apparently as more less technical types have joined AdSense in droves Google decided to dumb down the AdSense setup interface to reduce the tech support load and dependence on certain browser features.

The tabs across the top of the AdSense website for AdSense for Content, AdSense for Search and Referrals have all been replaced with a new tab AdSense Setup which now contains all the options for the previous tabs.

When you click on the new link for AdSense for Content you're in a 3-step wizard that drives you thru the process.

  1. Choose Ad Type (ad unit/link unit)
  2. Choose Ad Format and Colors
  3. Get Ad Code
My suspicion is many people confused people were accidentally changing options and getting the wrong code, or forgot to get the code, etc. and this will simplify the process for that overloaded page. It also makes it more clear which ad type you're getting as some complaining noobs in forums weren't understanding why they only got links and not text ads, so you know those poor people in AdSense support were crying in their beer every night.

Problem is, anyone with a serious amount of channels trying to get the code will be suicidal with this back and forth nonsense between those two pages. Anyone want to join a class action for Google inducing carpal tunnel with this new page? Just kidding, well, maybe.

I can see why some of these changes were needed for them but it sure would be nice to have an "expert mode" for us AdSense old-timers that actually are power users of the page they just ripped apart.

The only thing I can't believe they didn't fix in this update is making the CHANNEL page accept multiple URLs in a textarea instead of one URL at a time. Try setting up 200 URLs with this stupid page opposed to a quick cut-n-paste from your list of page names on your web log analyzer and you'll see what I mean, it's brutal and stupid, yet it remains unchanged.

Now for the million dollar question:
"What are they about to add to AdSense that required this overhaul to make more space?"

Let's keep an eye out and see what happens.

Tuesday, February 28, 2006

Comment Wars in Rantville

Normally I wouldn't just blog about a stream of comments but this article about Referer Spammer Revenge is almost a month old and the shit just hit the fan yesterday. I'm thinking the person I mentioned figured out it was me posting about them from my comment at Spam Huntress and went off the deep end.

We're talking some serious flaming here, cracked me up, but you be the judge.

Let this be a lesson to any children that may be reading this shit as it proves you shouldn't stick your finger too far up your nose as you can impair your thinking if you accidentally stab your brain and you'll end up a referer spammer too.

FIRST SIGHTING - New Bot Discovered

It's very rare I run across a bot you can't find any details about in Google but last night something calling itself "gnootBot" came from which just has a default Apache web page.

No information about this beast except it started asking for pages in the middle of my website.

Very bizarre.

Monday, February 27, 2006

Peeling the Scraper Onion with Reverse DNS

Stopping scrapers appears to be like peeling an onion in that when you peel away one layer of bad bot activity you unearth yet another whole new layer that you couldn't see before. You start with user agent filtering then put up speed bumps and honeypots to stop others and even profile other behavior to stop them and they still keep coming. Now we're several layers into this scraper onion all sorts of new things are showing up that require more sophisticated methods to detect and block as they're hiding as browsers, running low key crawls, but still obvious to anyone that it's not a human if you look at the pattern of access.

To help thwart more of this nonsense the latest tool added to the bot blocking arsensal is reverse DNS lookups to see where the IP originates from and blocking or challenging bad sources from the start. There have been a few trends to become very obvious in that many scraper IPs that were auto-blocked didn't resolve to a domain name whatsoever or come from some suspicious hosting farms, the two most notable and persistent ones have been in Taiwan and the UK.

Now they'll need to find yet another way to get around the bot blocker as the newly installed steel door now has a chain, 2 deadbolts and a mean as piss rottweiler waiting on the other side just in case.

Stay tuned for more on the next episode of As the Onion Peels.