Saturday, October 28, 2006

Ignoring my adoring fans, both of them!

I feel like I've been ignoring all of you lately but it's not true. I've just been so damn busy programming my little ass off, updating massive databases, writing web pages and PowerPoints, ordering custom embroidered polo shirts, so on and so forth, it's just crazy.

Sadly, I feel like the blog has recently become the red headed stepchild that nobody is playing with, not even the dog even if you hung a pork chop around the poor kids neck.

Seriously though, trying to get a massive update completed on an old site and roll a new product out the door while at the same time is crazy stuff. Top if off with getting ready for speaking at PubCon Vegas in November and SES Chicago in December is making me burn the candle at both ends but that hot wax feels OH SO GOOD on my nipple, but that's a different post.

It's like I've become some demoniacally possessed worker bee or some shit and I just can't get enough. I'm spinning out of control and probably heading for a serious burnout but it will be worth it as there are press releases and shit that are going to hit the web on a couple of fronts before the end of the year and I'm stoked.

Hell, I haven't even been going out to see movies, gamble or haunt strip clubs in almost 2 months but I'm sure a week in Vegas at PubCon will solve the gambling and hookers, um strippers issue.

Don't get too upset though, I'm still making it out at least twice a week for a nice 2-3 hour lunch with a friend and a LOT of beer.

Gotta treat myself right a little ;)

Thursday, October 26, 2006

Google, Yahoo and MSN Like Indexing Pure Garbage Sites

The other day I was working on a link checking filter so I could comb many thousands of linked sites and eliminate all sites from my index automatically that no longer contain valuable content.

What I did was make a filter that checked the profile of information on the page looking for signals that detected any sites that have reverted to default registrar pages, default hosting pages, or have become part of domain parks or scraper sites.

After successfully detecting and filtering out many sites that had fallen by the wayside, I started to wonder if the search engines actually indexed all of this crap.

Sure enough, a quick check of Google, Yahoo and MSN confirmed that the search engines eat these shit sites like candy although they can be easily detected and eliminated either by profiling the page content or checking the whois information, or a combination of both.

What purpose does indexing these millions of garbage web sites serve for any search engine?

I mean seriously, the scraper spam sites are one thing, but these are so easily detected there's no ryhme or reason they show up as results to any search being they are 100% crap.

Anyone from one of the major search engines mind dropping a note to explain why hundreds of thousands of cloned garbage sites are being indexed?

We'd really love to hear from you on this topic, please feel free to post a comment :)

Sunday, October 22, 2006

Scrapers Abandoning My Site?

In a rather unusual turn of events it appears my bot blocking efforts have went way better than expected to the point they might be backfiring. It appears that not only have some of the more serious scrapers stopped including my content in their sites, as they're being burned in the search engines (thanks guys) and new appearances of directly stolen content has went down drastically.

However, what I'm noticing is a new trend in sites that used to scrape my server now appear to be just scraping snippets off of other websites which I mentioned recently. Where this became most apparent was when I recently launched a boatload of new pages that had some breadcrumbs cloaked into all the pages not being served to the search engines. After a few weeks after releasing the new pages I went searching for references to these pages in Google, Yahoo, etc. and sure enough found some but they didn't contain my bread crumbs.

Doing a bit of quick investigation showed that the snippets indexed in Yahoo and Google actually came from the search engines themselves. This means that my site is being bypassed completely and the search engines are now the target for what little content the scrapers can get to use from my site.

Remember, I installed NOARCHIVE, NOCACHE and did a bunch of other things to minimize my exposure to the scrapers via the search engines many months ago yet they're still scrambling for the last few scraps that can get.

Just goes to show you how desperate these assholes are for any little scrap of information that ranks high.

Kind of sad that the search engines can't tell they're eating their own dog food though...

Spam Free Accomplishment Zone

Just thought I'd post a follow up after my latest anti-spam measures were put into place that it's been so blissfully quiet that I haven't even bothered to rant about these idiots or much of anything else lately because I've actually been doing more productive work.

Sure, the spammers keep knocking at the doors, banging on the walls, tapping on the windows, but other than falling silently into my spam log just to keep track of what kind of trash was at the door and silently swept away, I see none of it.

The last few weeks were absolutely amazing as I forgot just how much work a person could get done when you aren't constantly trying to clean up after those fucking spammers trying to shit all over every web form they can find.

Sorry spamming assholes, your days are numbered and I'm loving every minute without you.