Friday, December 01, 2006

Webmaster Owns Spammers Ass

This is priceless as one of the phpBB spamming idiots from Russia messed with the wrong webmaster this time who now owns his spamming ass.

You just have to read this DRC forum post to believe anyone could be so stupid.

Thanks to SpamHuntress for pointing this out.

Wednesday, November 29, 2006

Dear Amazon AWS Group

To whom it may concern,

Your bot crawled my site today as shown below. Please notify your engineers, and I use the term loosely, that "Java/1.5.0_09" is not a valid bot name. Being that Amazon sells books on how to program Java, I'm sure you can find at least one book in your warehouse that will explain how to set the User Agent string when making web requests.

Additionally, would honoring ROBOTS.TXT be too much to request or do you feel justified not checking the robots file since your programmers can't figure out how to tell us what your bot name is in the first place? [] "Java/1.5.0_09" [] "Java/1.5.0_09" [] "Java/1.5.0_09" [] "Java/1.5.0_09" [] "Java/1.5.0_09" [] "Java/1.5.0_09" [] "Java/1.5.0_09" [] "Java/1.5.0_09"
It makes me weep for the future when a big web conglomerate, one that has a name that is synonymous with buying things online, one that should know better, starts to slide down that slippery slope of being a bad netizen.

Get A. Clue

SiteAdvisor and ThePlanet Must Not Care

There were several hits my blog post about SiteAdvisor from Network Associates, that owns McAfee SiteAdvisor, yet nothing changed. Wouldn't you assume that after reading my posts about SiteAdvisor Green Lighting sites with the worms in them that someone would at least change the site status to protect people.


Funny, Symantec's Norton AntiVirus agrees with me that the site has a worm, but SiteAdvisor says you're good to visit.

Maybe they don't think it's a threat because McAfee AV products don't detect this worm?

Who knows, I'll stick with Norton AV.

Then again, we have ThePlanet that hosts these sites, and they were notified 6 days ago that this problem existed on 4 of their servers and these sites are still online and functional.

I guess nobody cares about security these days.

Tuesday, November 28, 2006

BDFetch Plays By The Rules

Normally I'm always slamming corporate bots but when one company, like brandimensions appears to be playing by all the rules, I feel they should get a little praise.

Here's what their access attempts look like: "GET /robots.txt HTTP/1.1" "" "BDFetch" GET /somepage.html HTTP/1.1" "" "BDFetch" "GET /robots.txt HTTP/1.1" "" "BDFetch" GET /somepage.html HTTP/1.1" "" "BDFetch" "GET /robots.txt HTTP/1.1" "" "BDFetch" GET /somepage.html HTTP/1.1" "" "BDFetch"
At least they asked for robots.txt and appear to only go in when allowed.

However, they had a couple of bumps that I'd like to see them fix.

1. Ask for robots.txt once or twice a day, maybe once an hour worse case, not every access.

2. Set your reverse DNS to say or something similar so we can verify it's really your company and not someone spoofing you.

3. Include a link to a page about your crawler in the user agent, and a version number, such as ""BDFetch/1.0 +"

Other than those minor glitches, kudos for at least trying to play by the rules and at least giving webmasters the choice to allow you to crawl or not.

Nicely done.

Legality of Stealth Robots, Are They Trespassing?

What is the legality of a stealth robot, are they doing anything wrong?

Take a look at "Computer Hacking and Unauthorized Access Laws" and you'll see there's a quagmire of various laws but the topic that's most relevant to this discussion would be "Unauthorized access" which basically covers trespassing onto a computer, theoretically even if that service is a public web server as the laws don't specify the server or service has to be private.

I'm no lawyer, so this obviously isn't valid legal advice, just my musings over the content of the California law, particularly the definitions in 502.c:

(c) Except as provided in subdivision (h), any person who commits any of the following acts is guilty of a public offense:

(1) Knowingly accesses and without permission alters, damages, deletes, destroys, or otherwise uses any data, computer, computer system, or computer network in order to either (A) devise or execute any scheme or artifice to defraud, deceive, or extort, or (B) wrongfully control or obtain money, property, or data.

(3) Knowingly and without permission uses or causes to be used computer services.
Let's examine what these transparent stealth crawlers do and see if it fits the definition.

First, the people using stealth crawlers know if they use a real user agent like "Bob's Bot 1.0" that it will expose their presence and they will be blocked. To avoid this, they mask their presence which obviously falls under "knowingly accesses and without permission" to get to the content on the web site attempting to block their trespass.

Second, after they have gained access they "wrongfully control or obtain ..., property, or data" and do with it as they please, republish without permission, use to compile reports, etc., so I think we've covered two aspects here.

Even if the act itself causes relatively little harm, there is still a potential for penalty.
(3) Knowingly and without permission uses or causes to be used computer services.

(A) For the first violation which does not result in injury, and where the value of the computer services used does not exceed four hundred dollars ($400), by a fine not exceeding five thousand dollars ($5,000), or by imprisonment in the county jail not exceeding one year, or by both that fine and imprisonment.
The obvious solution for the crawler to be technically "legal" is to simply identify the bot by an obviously unique name like "Bob's Bot 1.0" and stop trying to spoof the web server as being Internet Explorer or Firefox in order to gain access.

I'd be curious what some legal minds might think about this interpretation of these laws for this particular application.

Sunday, November 26, 2006

Huge Made for AdSense Scraper and Spammer Operation Unveiled

The downside of scraping the wrong webmaster is that your websites now contain breadcrumbs that let that webmaster unravel a big chunk of your network of sites that you've been scraping and spamming.

I'm not going to even go into the list of domains I found my scrapings on as it's a huge list and the specific sites I found were all hosted on and

Besides, if I expose the list this MFA scraper spammer might figure out how I unraveled his system and we wouldn't want that, now would we?

I'm not even going to bother with the IP they were scraping from or the user agent since it was a spoofed browser UA of course, and the IPs doing the scraping were all from the same hosting companies listed below.

Instead, let's start at the top of the iceberg with their statistics pages listing 400-500 sites per page which in total roughly links to about 6,500 individual scraper sites, and I'm sure we're just touching the surface here.
So where do these sites host? -> -> -> -> -> ( -> -> -> -> -> -> -> -> (Everyones Internet) ->

There you go, it could've been a been long spew of data but there's really nothing you need to know except BLOCK access from data centers and you'll be a bit more secure, which I've been preaching for quite some time.

Now, let's look at a specific site like and you'll see how they really spam the search engines with 3 digit subdomains. All of their sites are like this and there are literally hundreds of thouands, if not millions, of junk pages associated with this one group of domains.

And we'll take a peek at another of these sites, like, to see how they promote themselves with blog and forum spam for traffic.

There you have it all with scraping, search engine spam and blog and forum spam all tied up in one neat little package.


P.S. Did we piss on someone's cornflakes?

Getting a ton of hits to this post via a forum on which makes you go Hmmmm.... it's amazing how they out themselves once you post something.

Anti-Phish Shootout: Firefox 2 vs. MSIE 7

Another phishing email arrived today so I tried it in both Firefox 2 and MSIE 7 as fast as I could get the link pasted into the browsers.

The resulting screenshots below and the score is:


Maybe Microsoft should hire some of the Firefox developers so they can show them how to do anti-phish properly.