Friday, June 09, 2006

My sites doesn't need UPDATED!

How many comparison shopping sites can one planet possibly need?

Well, whatever your answer was, it's wrong, there's one more called Updated crawling around.

The user agent info:

38.119.96.110 "updated/0.1-alpha (updated crawler; http://www.updated.com; crawler@updated.com)"
Block it, don't block it, I don't care...

Thursday, June 08, 2006

Google Dance Makes You Shit Your Pants

What a day.

Got up this morning and my main sites traffic was wonky so I took a look around and Google traffic seemed to be askew. Checked a few data centers and my position was all over the map. Up and down, round and round, Google's chewing me up and spitting me out in all sorts of wacky places.

My default datacenter now shows me back in the top 10 but only God and Google knows where it'll be tomorrow and neither of them are talking.

What a mess, Google better have a new liver standing by for me...

Anonymous Media, just can't make this shit up

Here comes another scaper with a mission called Anonymous Media and they want your shit.

63.133.162.98 - "GET /robots.txt HTTP/1.0" 200 146 "-" "Anonymous/0.0 (Anonymous; http://www.anonymous.com; noreply@anonymous.com)"

Right on their website it says they:

  • SPY "anonymous watches what consumers watch"
  • SCRAPE "anonymous compiles data from multiple sources"
  • PROFIT "anonymous generates market reports and analysis"
  • TARGETS YOU "anonymous conducts custom market research"
OK, is it just me or are we all getting SICK OF THIS SPYBOT SHIT?

Maybe we should just install 301 redirects to other spybot companies when spybots come crawling it points to a competitor and just let them crawl each other's sites to death.

RED ALERT UPDATE - SnapBot and the Linux Firefox Revelation

Finally, after weeks of these idiots bombarding my servers with Linux Firefox for unknown reasons I finally connected all the dots as SnapBot is apparently running from 3 data centers that I know about and all 3 have run both the SnapBot crawler and Firefox from the same IPs.

65.38.102.0/24 "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060124 Firefox/1.5.0.1"

38.98.19.0/24 "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060124 Firefox/1.5.0.1"

66.234.139.0/24 "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060124 Firefox/1.5.0.1"

You know what SnapBot was doing from these 3 locations?

Making goddamn screen shots of every fucking webpage, or trying to anyway, all 40K+ of them too!

Fuckers.

I know this for a fact as I found a picture of my error message about their site being banned with their IP number embedded in it from the 65.38.102.0/24 block and the others are already known Snap IPs.

Listen assholes, take a picture of the home page and leave it at that, you aren't getting 40K screenshots and anyone that thinks they are can blow that shit out their ass.

Don't Worio, just another search engine

Something new hit my site claiming to be Worio and planning to go live in 2006 according to their website. Another site using Heritrix to crawl with, oh joy, almost becoming as annoying as the new-nutch-site-of-the-week club.

BAD_AGENT: 198.162.51.70 [worbo2.cs.ubc.ca.] requested 1 pages as "Mozilla/5.0 (compatible; heritrix/1.6.0 +http://www.worio.com/)"

What makes Worio special is it's supposed to be created for computer scientists and programmers.

OK, there goes my lunch.

Don't Worio, Be Happy.

Wednesday, June 07, 2006

REJECT SPAM, DO NOT BOUNCE!

I've been setting my email pref's on all my servers to REJECT unknown addresses instead of BOUNCE which opens your server up to all sorts of spam problems as your email queue fills up with undeliverable garbage and your server starts to choke.

Thought I was pretty clean but I imported some domains from another server and suddently started seeing 150+ emails piled up ever day. They were all failure notices just sitting there chewing up my box trying to deliver all day long.

Got sick of that shit last weekend and did a complete email audit and now ALL DOMAINS are set to REJECT unknown addresses.

Guess how many outbound emails are pending now?

ZERO!

Been zero since I did it and holding strong so spammers using randomname@mydomain.com are no longer bothering me whatsoever.

Just a reminder, check your email accounts and make sure you're locked up tight!

Block them PRONTO

Roll out the RedCarpet and let them scrape all they want.... NOT!

We found this little beast bothering us:

BAD_AGENT: 66.45.38.59 [unknown] as "RedCarpet/1.3 (http://www.pronto.com/robots.html)"
On their web site it says:
We use our patent-pending software [crawler] to scour the web [scrape] and identify the most merchant sites and products possible for you.
What a bunch of happy horseshit, web crawlers are now patent-pending software?

Go take a long walk of a short pier and jump into lake go-fuck-yourself.

PlanetLab comes knocking

Here we go again with what looks like someone testing my PlanetLab barriers to see which proxies do or don't work and this just started a few minutes ago and will probably get more interesting in an hour or two.

Note the same browser from each location:

BANNED: 128.4.36.12 [planetlab2.pc.cis.udel.edu.] requested 1 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
BANNED: 129.10.120.111 [planetlabone.ccs.neu.edu.] requested 1 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
BANNED: 155.98.35.2 [planetlab1.flux.utah.edu.] requested 1 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
This PlanetLab bullshit needs the plug pulled on it ASAP.

Google Gags on Drop Down Menu

This was an amusing find in my log file:

66.249.72.171 - "GET /%20Select%20An%20Item%20 HTTP/1.1" 404 "-" "Mediapartners-Google/2.1"
Wouldn't you think all those high-priced Google employees would know better?

RED ALERT #2 UPDATE - It's SNAP snapping images

Recently I issued a RED ALERT on BBCOM and it appears it's Snap making screen shots.

I'm positive it's Snap as one of those screen shots has my firewall message displaying the IP address of 65.38.102.146 and the user agent "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060124 Firefox/1.5.0.1"

Too fucking bad Snap, I'm still blocking your dumb ass until you put your ID on the User Agent from those IPs.

No wonder people think their conversion rates are declining when fucking search engines are even skewing the numbers will bullshit like this unprofessional horseshit.

Tuesday, June 06, 2006

Charlotte, what a web you weave

In the BIG SHOCKER of the day department, when I installed my prototype bot blocker in another of my own websites today to expand my data stream wouldn't you know it but it caught an active spider that was crawling many hundreds of pages the second it was enabled.

This crazy assed spider it still going strong, or thinks it is, well over an hour after stopping it from getting real data.

The spider is Charlotte, with these specs:

209.249.86.4 "Mozilla/5.0 (compatible; Charlotte/1.0b; charlotte@betaspider.com)"
Never heard of Charlotte before so I took a peek to see what others might know about it and stumbled into the most hysterical thread I've ever seen in the OsCommerce Forums about all these store owners frantically chasing down spider names and dropping spider names it into something called their spiders.txt file or some shit.

Dudes, blocking by user agent is so 1990's, you'll stress out and become incontinent doing it that way. What a waste of time, wish I was ready to help you all already, but such is life and quality takes time.

Monday, June 05, 2006

Blog Spam is not a problem

That's right, blog spam isn't the problem whatsoever and technically it shouldn't even exist if it wasn't for the sloppy work of the idiot programmers that write blog software. Anyone leaving the comment forms wide open so that any script kiddie could abuse it should have their programming license revoked.

Now there are even solutions springing up to monitor and stamp out blog spam and it's fucking ridiculous or would hysterical be the proper word?

For example, their feature claims:

AntiSpam Deluxe.
Get rid of referrer and comment spam thanks to a community maintained spam database.
Community maintained database, you must be kidding?

Can you people say CAPTCHA?

I know you can, put a captcha on the comments and registration page to stop automation.

The part that just kills me is whoever was naive enough to put "registration" as the only means of defense without captchas in blogs thinking that making people register to post would stop spam because bots don't use cookies or register. I've noticed a few people resorting to registration alone as an anti-spam technique lately and that simply won't cut it. Wrong again, bots use cookies, so you better put a captcha on your registration page otherwise the bots just register as some random name and post anyway.

Now address manual spamming as follows:
  • Filter out all posts that contain embedded HTML and URLs from all first time posters, just bounce any post that looks like spam.
  • Don't hyper-link names of posters to websites until they become a trusted poster or register.
I do these exact techniques on some of my websites (not this website, except the captcha) and the spammers simply went away. There was nothing they could do of value unless they wanted to just vandalize out of spite, and most are interested in making money so once I removed their incentive they stopped coming back.

Can't post a link to your website?

Oh boo-fucking-hoo, then your spam won't drive visitors, so go the fuck away.

Wow, that was easy wasn't it.
  • No community database
  • No anti-spam service
  • No bullshit
  • No spam
Now you're probably going to start yelling that my concept breaks the premise of blogs in that it's all about the community and the linking and all that other happy horseshit.

Nope, you can still link out, just not on your first damn post, or maybe your first 10 posts. The easiest way to build up "post count" would be using automation which is blocked again, that's where the captcha's come back into play.

If you really want to piss off the spammers, require both registration AND a captcha.

Now go fix your crappy blog software, download a captcha and install it, and stop whining about spam ya bunch of cry babies.