Friday, March 07, 2008

Slow Down Nosy SEO's and Snooping Competitors

Most webmasters spend a lot of time and effort working on marketing their website, or pay someone a lot of money to do this, yet don't do a few common sense things that keep lazy and nosy assed SEO's or other competitors from quickly analyzing all your hard work and simply stealing what you've done.

Not that you can completely stop them because much of the competitive information about who links to you is already public, collected by search engines and toolbars, but you can sure as hell make it a little more difficult to get the rest of the data they want.

Since the SEO Chicks published a list of competitive research tools to help those nosy SEO's snoop, I just thought it would be fair and useful to have a nice list of ways to stop as many of those those snooper tools as possible.

Block Archive.org - No need to let anyone see how your site evolved, snoop or even scrape through archive pages without your knowledge so block their crawler.

User-agent: ia_archiver
Disallow: /
Rumor has it that the ia_archiver may crawl your site anyway so adding it to your .htaccess file is a good precaution as well.
RewriteCond %{HTTP_USER_AGENT} ^ia_archive
RewriteRule ^.* - [F,L]
Block Search Engine Cache - Some people cloak pages and just show the search engines raw text yet show the visitors a complete page layout. Who cares, that's your business and a competitive edge you don't need to share, plus pages can be scraped from search engine cache as well, so disable cache on all pages.

Insert the following meta tag in the top of all your web pages:
<meta content='NOARCHIVE' name='ROBOTS'>
Block Xenu Link Sleuth - Why do you need people sleuthing your site? Screw 'em...

Add Xenu to your .htaccess file as well:
RewriteCond %{HTTP_USER_AGENT} ^ia_archive [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu
RewriteRule ^.* - [F,L]
Make Your Domain Registration Private - Why give the SEO's or any other competitor any clues to help them whatsoever?

Sign up with DomainsByProxy and this will make the nosy little bastards happy:
WHATEVERMYDOMAINNAME.COM
Domains by Proxy, Inc.
DomainsByProxy.com
15111 N. Hayden Rd., Ste 160, PMB 353
Scottsdale, Arizona 85260
United States
Restrict Access To Unauthorized Tools - Use .htaccess to white list access to your site and just allow the major search engines and the most popular browsers which will block many other SEO tools. If you don't understand the white list method and it scares you, there's a few good black lists around too.

This is a limited sample for informational purposes only just to give an idea how it works, see the thread linked above for more in depth samples by WebSavvy, just be cautious in implementing a white list as it's very restrictive:
#allow just search engines we like, we're OPT-IN only

#a catch-all for Google
BrowserMatchNoCase Google good_pass

#a couple for Yahoo
BrowserMatchNoCase Slurp good_pass
BrowserMatchNoCase Yahoo-MMCrawler good_pass

#looks like all MSN starts with MSN or Sand
BrowserMatchNoCase ^msnbot good_pass
BrowserMatchNoCase SandCrawler good_pass

#don't forget ASK/Teoma
BrowserMatchNoCase Teoma good_pass
BrowserMatchNoCase Jeeves good_pass

#allow Firefox, MSIE, Opera etc., will punt Lynx, cell phones and PDAs, don't care
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass

#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well


order deny,allow
deny from all
allow from env=good_pass

Disclaimer: I don't use .htaccess for much so please don't ask for a complete file, this is just a sample as I use a more complex real-time PHP script to control access to my site.

Block Bots and Speeding Crawlers
- You can use something like the nifty PHP bot speed trap Alex Kemp has written or Robert Planks AntiCrawl. Just another layer of security piled on against snoops and scrapers that pretend to be MSIE or Firefox to avoid the white list or black list blocking in .htaccess.

Block Snoops From Robots.txt - Don't allow anyone other that your white listed bots to see your robots.txt file because it has other stuff in it that SEO snoops might find interesting, and it can become a security risk. Use a dynamic robots.txt file like this perl script on WebmasterWorld and just add the rest of your allowed bots to the code next to Slurp, Googlebot, etc.

Block DomainTools - since SEO's use it to snoop, no reason to let DomainTools have access so just block 'em.

Probably lot's of other things you should be blocking as well but this will give you a good start.

This list doesn't completely stop snoops from manually looking at your site, but it certainly stops all of those automated tools from ripping through all your pages, search engine or archive cache, and presenting a nice pretty report.

Heck, why should you help people take away your own money?

Start slowing them down today and stop the next up and comer from getting the info too easy.

UPDATE:

One more creative thing you can do to your website is cloak the meta tags so that only the search engines see them and disable the meta tags for normal visitors. Nothing really wrong with this because meta tags by definition are only for the search engines and snooping SEO's will be completely left in the dark when they can't see your meta keywords or description.

Especially if you combine cloaking meta tags with the NOARCHIVE option described above so then it's completely hidden from prying eyes.













93 comments:

Kamo said...

Welcome to my feed reader :)

Great post.

Would appreciate some kind of follow-up with some of the more complex php scripts you mentioned. Not the exact ones you're using, but maybe some tutorial or something.

Thanks again!

Anonymous said...

When will you be launching CrawlWall?

It has been a long time coming ;)

Johann said...

I agree that blocking the Internet Archive is a very good idea. There are other smaller archives besides archive.org though.

When you mention ia_archiver, you might want to add that heritrix thing, too, as well as other open source crawlers. Like... Nutch ;-)

I'm not sure about the whitelisting. There are lots of legitimate user agents around that don't fit into these schemes.

IncrediBILL said...

johann, I've been whitelisting for years. It's pretty safe because most of the other crap provides no value.

Bluesplinter said...

I'll second the question about CrawlWall... waiting for that tool has kept me from digging too deeply into other, less effective methods.

Mick said...

This why I love Bill's site always great info for the like minded and easy to understand in laymans terms.
I almost suspect Bill does not want to even bring out Crawlwall anymore so as the baddies doe not di-sect and decode it...totally understandable..perhaps bring out a lighter version we all could purchase and use Bill?

Johann said...

Bill, I've just converted my setup to whitelisting and will see how it goes.

I need some more rules than you do, though. I think I have around 100 at the moment. These include a lot of mobile browsers.

IncrediBILL said...

Johann, I did say don't use the list in my blog post. It's just a proof of concept type of thing, not really ready to use. WebSavvy's is much more evolved, if you send her a PM she'll probably give you her current list or post it.

IncrediBILL said...

FWIW, for those asking, product release stalled on some medical issues.

Promised the wife I wouldn't launch something and leave her holding the bag until the medical issue seemed to be resolved and it's as good as it's going to get.

So now I'm back to ironing out some technical stuff, fun fun!

Anonymous said...

Anyone know the IPs that Alexa crawls from?

Ban Proxies said...

How many site owners have a dedicated test domain? Test everything!

If you use "the nifty PHP bot speed trap Alex Kemp has written" enable the whitelist option. This script will ban the "Adsense Bot". Integrating IP-Whitelist with Unruly Bot-Blocking Script.

Thanks for the "AntiCrawl" link Bill. I'll give it a test run after I've finished playing with M&M Autoban, Download Link

Ban Proxies said...

"Add Xenu to your .htaccess file as well:

RewriteCond %{HTTP_USER_AGENT} ^ia_archive [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu
RewriteRule ^.* - [F,L]"

The first UA is Alexa.

Anonymous said...

How, exactly, are people looking at all of these things "taking your money"? Particularly if they actually hit web.archive.org's or google's servers, rather than yours.

The only one of those suggestions that I would agree with is a speed trap that chucks a captcha at any IP requesting N or more pages in a short span of time or more than some amount of megabytage; a passed captcha being good for some number of hours before you might get another one. This impedes humans minimally, but stops badly behaved bots in their tracks that, by generating too much traffic, genuinely are costing you money by raising your hosting costs.

Ban Proxies said...

"Particularly if they actually hit web.archive.org's or google's servers, rather than yours."

NOARCHIVE, NOINDEX, NOFOLLOW, NO-CACHE

Anonymous said...

Using the User Agent to block is a waste of time incediBill so while good intentioned a serious hack would fake the user agent in the first place and for that reason I think the post needs some more work.

protectyourcontent.org

IncrediBILL said...

Hey anonymous - I don't black list by UA because it mostly is a waste of time but some people are too afraid to WHITE LIST so I offer both options - pay attention in class ;)

If you white list ALL user agents that you don't expressly authorize are disallowed, it's not blocking by user agent, it's blocking anything NOT ALLOWED as a user agent, so everything other than a narrow list of UA's get dumped by default.

Then you use Alex's speed trap script or AntiCrawl to stop the rest of the crap.

Very solid combo.

IncrediBILL said...

@other anonymous - Although there are similar aspects to blocking scrapers and other bots, they aren't the only things out there that can do harm in some variety.

We aren't talking bandwidth theft this time, we're talking about people using tools to gain easy access to your intellectual property to attempt to replicate your success in the search engines.

How your competitors or SEO's can take your money when they find out what you rank for, how you get that ranking, and successfully mimic it.

Not that you can stop SEOs from finding out what they want to know, but you certainly don't have to hand it to them on a silver platter by letting their little site crawling SEO software comb through your site to gain intel.

Anonymous said...

Call me protectyourcontent,

The reason for me to post Anonymous is to register is a pain in the arse and you made the site inaccessible in so many ways I didn't feel the site deserved my registration...

Anyway your system depends on user agents that can be easily fooled and faked so I think does not protect any sites and gives a false sense of security to people that have read it.

Anonymous said...

incredibill,
From:protectyourcontent.org

I think you are fully aware user agent blocking is a waste of time and as you know can be very easily faked
so I have to ask why are you posting
flawed advice?

Ban Proxies said...

Someone needs to "Protect" their gray matter.

IncrediBILL said...

OK anon@protectyourcontent.org, re-read it. I'm suggesting they don't block, but WHITE LIST.

I could point them to some full trip DNS checking for all the valid bots but I'm trying to not overload people as this wasn't exactly a bot post but about stopping snooping SEOs.

Sheesh.

IncrediBILL said...

So how did I set up the blog to be too hard to type in a name and URL and only be anonymous?

You click the radio button "Name/URL" and then type them in.

I know it's rocket science so post your email address and we'll have a Nasa tech support specialist contact you ASAP to help with this matter.

Protectyourcontent said...

Sorry incrediBill yep worked out your system now my fault.

Yep your right post was to slow down snooping competitors. We all on same side mate yeah cool I see where you comming from.

I wanted to post some javascript to make it awkward to copy text on a page as a contribution but get this error

Your HTML cannot be accepted: PHP, ASP, and other server-side scripting is not allowed

IncrediBILL said...

You can't post code unless you make the less than or greater than symbols into &lt; and &gt;

The narrow formatting in the blog will tend to jack most of the code up anyway so it's better to just post links to code elsewhere unless it's a very tiny snippet.

Protectyourcontent said...

Yeah I can understand I strip out html out of contact forms as a matter of course now. I posted the code on my site. Like you said its also about slowing down competitors so thought was worth mentioning.

Now what do you think of the idea of protecting content through frames and ajax?

Protectyourcontent said...

Incredibill if your still there I want to show you a system that I think will end all automated content theft scraping hijacks spam bots. Through frames and ajax and preserves accessibility if your interested...

IncrediBILL said...

Of course I'm interested, but Ajax is a PITA for SE's to crawl.

The easiest way to stop scrapers is to put your navigation in obfuscated JS and cloak that version to all non-SE visitors.

The problem is, it's not handicapped accessible and the blind won't be able to use your site.

Not positive, but I think AJAX is just as problematic for the blind for the same reasons.

IncrediBILL said...

Frames? That's not a problem to crawl as I wrote my own little crawler and processing frames is trivial.

Protectyourcontent said...

Incredibill I said this system will mean the complete destruction of black hat seo, so of course I have taken into account frame content and have a backend system that kicks in to only allow content to be displayed through javascript if you scrape the frame. At best you will only ever to scrape a homepage and never get the meta tags which is what these guys are really after...

All working off one url so social bookmarking is preserved.

Frames are used as a default to preserve accessibility of the net.

No more banning ip's that often include large networks of people.

No more user agent lists. This system just destroys every automated scraper,hijacker email harvester on the market.And I want to give it away for nothing.

Like I said if your interested I would be honoured to set you up a test.

Protectyourcontent said...

I don't think your understand untill I set up the test trusted ip's and search engines see the content without frames and ajax.

For the test I will need you to supply your ip and download user agent switcher in firefox and upload the xml file of useragents including google,ask,yahoo and msn

Protectyourcontent said...

Also Incerdbill this system protects the frame content from links by changing the url either by the minuete by the hour by the day or by the month with an encrypted code. So any black hat setting up links to the frame content would be wasting there time.

Protectyourcontent said...

I evan deal with APi SOAP XML abuse from search engine feeds as well and totall image protection and am just completeing pdf protection for this system.

RSS protection will be dealt with by creating a summary of the article they hey you can vary randomly.

Like I said the complete destruction of black hat seo.

IncrediBILL said...

Provide my IP address?

I think you've gone MAD!

Totally MAD!

Not publicly anyway as the BH scrapers and spammers would DoS my connection into oblivion within minutes.

Maybe I can find a nice proxy IP to use... ;)

Protectyourcontent said...

Yep It's mad to think some drunk Irish South London Ex-Boxer can defeat a massive network that scares the shit out of everyone online. But I can and your blog and matss was a large part of helping me do it. Just vist the site I'll grab your ip and post last two digits for you to confirm.

Honestly mate I think I got them beat and want you to see it first.

Protectyourcontent said...

Trust me vist by proxy if you want but otherwise please trust me. Look switch of your router and restart, release and rnenew dns after your down but I am asking you to trust me here incredibill I won't let you down.

IncrediBILL said...

I smell a sneak attack coming... sending the family away to the 'safe house'... j/k

Protectyourcontent said...

Actually Incredibill use a proxy thats safer for you and I want these bastards to see whats comming and feel completely powerless to do anything about it just like I did when they took me out.Post the ip of the proxy and I'll set up the test.

IncrediBILL said...

Besides, I believe you.

I've been advocating cloaked javascript for comment forms at a minimum for years but the handicapped accessibility people always claim that won't work.

The problem is that I know for a fact some of the sneakier skilled BH's out there can easily encapsulate the JS engine from Firefox source and/or use MSIE's SDK to drive the navigation and get full access to the final result.

Won't be as fast but crawling a complete JS site is completely doable except by the lame assed script kiddies.

Protectyourcontent said...

You will learn in time that I am not one of them and can be trusted you will see it on the test. Its Ok use a proxy. I totaly appreciate what you do for the net so that comment was unkind.

I was gonna show you the code behind the system thats why I said your ip. But for a test I can just show the system.

Protectyourcontent said...

"sneakier skilled BH's out there can easily encapsulate the JS engine from Firefox source and/or use MSIE's SDK to drive the navigation and get full access to the final result."

Not when its framed as well and you can protect the frame content and every page of the site carries that protection. But yes I am here so you can help improve what I have already created.

Thats why I want to show you it. Also to preserve accessibility javascript protection only kicks in once the ip has been identified as a potential threat.

Protectyourcontent said...

And what if even if they reach the content all the meta tags are made bullshit for each page.

Remminds me have some great new tar pits to discuss with you what about one that fills the scraper with "I'm a scraper and I ok I scrape all night so I can sleep allday"... sound fun?

Protectyourcontent said...

Come on incrediBill surely its more fun than serving a 404.

IncrediBILL said...

Who serves up 404s?

I serve up "FUCK YOU!" pages.

Protectyourcontent said...

Good call respect!I think the best word to use is "CUNT" this automatically disallows them from safe search.

Protectyourcontent said...

Actually mate you should edit that and put some stars on that word I don't wanna cause you a problem.

Give me a proxy ip so I can set up the test.

Protectyourcontent said...

For protectyourcontent.org

I wanna use nancy sinatra "my baby shot me down" as the soundtrack and pictures from "the good the bad the ugly" where clint eastwood is a white hat for the flash all done on a slideshow projector from the thirties how does one go about obtaining copyright permission?

Protectyourcontent said...

A workin project but here's something we been workin on imagine nancy sinatra blended in...

http://www.protectyourcontent.org/workingon/blckhat.swf

Protectyourconntent said...

Ok I'll move it up a folder

http://www.protectyourcontent.org/blckhat.swf

Protectyourcontent said...

Incredibill I am here because I value your input and will always value your input. I am not here to replace you compete with you or any other such thing.

I am here to completely destroy black hat seo not to damage you in any way shape or form. You have been at the front of content protection for a long time and in my opinion I am just a beginner and you are the expert.

mick said...

Interesting boys..I love it keeps us in the loop eh!

perhaps if it is good enough Bill,you can intergarte it into crawlwall.
----------------------
AS "RICKY" SAYS "WE NO STRANGERS TO SCRAPING,AND YOUR HEARTS BEEN ACHING TO A SCRAPING LALALALAL
http://www.internetisseriousbusiness.com/
---------------------
Dont forget me when the podiucts done eh boys :)

Protectyourcontent said...

Rather confident aren't you "mick".

Like I said all scrapers will be ended with this system. All Black Hat Seo is over.So laugh all you want "mick" I just fucking ended your industry and incredibill already knows it.

Protectyourcontent said...

Still laughing now "mick" !!!!!

There is no cure to this solution and I will release it open source.

Still fucking laughing "mick" or are you choking about now.

Protectyourcontent said...

Advice for black hats feeling sick right now at the end of there industry. Two choices buy a gun and blow your head off, or create geniune sites....

I couldn't give a damn which one you choose.

Protectyourcontent said...

Actually I do care about black hats most are young, and inside every black hat there is a original idea waiting to come out. If these guys focussed on positive things they can offer something unique and original to the internet. And I would be so proud of any of them that made the next big thing online and I think they are all capable of doing it.

Mick said...

whoooooah
Ease down there partner.

Lighten up a bit.

I am on your side,what kind of sense of humour have you fuckin got?

I have been a fan of this site for ages,as bill would attest.

In fact just the other day i was searhcing for some security stuff and landed on your page and was surprised you were here as well.

Good luck with your technology and hopefully good comes out of it.
Dont forget me when you have a product to sell as i beleive in you guys.
Peace ..take it easy eh!

Mick said...

Oh I see you think that i think that you are going to make moolah from this perhaps...and was kind of offended maybe?

In that case sorry for the misunderstanding..it just came out wrong.

But as Bill and the fans of this blog would attest we are waiting with cash in hand for crawlwall.

But if you are going down the free ware stuff power to you my friend.

Anonymous said...

We aren't talking bandwidth theft this time, we're talking about people using tools to gain easy access to your intellectual property[snip!]

I wanted to post some javascript to make it awkward to copy text[snip!]

protecting content[snip!]

content theft[snip!]

this system protects the frame content[snip!]

image protection[snip!]

pdf protection[snip!]

this automatically disallows them from safe search.

obtaining copyright permission?

Good luck with your technology and hopefully good comes out of it.

waiting with cash in hand for crawlwall.

Stupid!

http://www.againstmonopoly.org
http://www.questioncopyright.org
http://www.google.ca/search?q=%22
economics+of+free%22+site%3A
techdirt.com

RSS protection[snip!]

Really stupid!

put your navigation in obfuscated JS[snip!]

cloak that version[snip!]

cloaked javascript[snip!]

I serve up "FUCK YOU!" pages.

.swf

crawlwall.

Evil!

never get the meta tags which is what these guys are really after...

all the meta tags are made bullshit for each page.

Majorly stupid!

If nobody can see your meta tags, Googlebot can't see your meta tags. If Googlebot can't see your meta tags, you're screwed.

I totaly appreciate what you do for the net

Really fucking stupid!!

There is no cure to this solution[snip!]

What you say ?!

------------

Negative sum games will get you people nowhere. Though if they're taken too far they may very well land us all in the soup.

The only winning move is not to play.

IncrediBILL said...

I know I shouldn't feed the troll but the only thing really fucking stupid is the last comment left above this one.

Besides, I really don't see a problem cloaking search engine META tags just to the search engines and not showing them to visitors because browsers don't do anything with those meta tags and snooping SEO's don't need to see them.

Now please go find that rock you crawled out from under and get back under it.

Doug Heil said...

It's just a silly SEO person who is pissed off about what you stand for. He/she built one of those tools you want to block, and are showing others how to block.

Anonymous said...

Protectyourcontent - is that you keniki ?

Josh Wexelbaum said...

On February 29th, I blogged a much more extensive list of competitive research tools:

http://www.scrappybusiness.com/competitive-intelligence-tools.htm

Protectyourcontent said...

I apologise Mick I seem to have a team of "blackhats" that post under various names following me and made a mistake by assumming you were one.

I certainly will check out crawlwall but I really want to make this open source at least my solution as this will piss of as many blackhats as possible and I hope protect as many people as possible.

Perhaps you guys could work on a more advanced version for serious threats and this could be developed for crawlwall and you could allow non for profit sites or charity organisations this advanced protection for free, perhaps there are contributions I can make to a more advanced version.

Mick said...

All is cool Bro...Apoligy should be mine,as i probably should put more thought into my posts.

But hey we are all on the same side.
To be honest my sites have very little traffic or earnings BUt i got so pissed off once when someone hijacked my site..i became one of these MAD as hell types..for me it was the principle,as i would never harm anyones site.

Thus searching i found Bills site and have been a fan eversince.
It is interesting to say the least like a game of mouse and cat with the blackhatters.

------------------
I know Bill would probably frown on this,and does not do it for profit But let me get this off my chest.

I was actually thinking Bill should change/migrate this forum into a modern outfit(shoe/johnC type) and even advertise,so the funds can go into if he so chooses cancer charity or whatever.
The reason i say is we can have a whole lot of functions,as now i have to scroll down posts to answer as it gets buried..in a new set up say the top five posts of the day can be bumped up when someone answers.
Also we can have questions section or Help section where we can ask questions and recieve help or help others etc. I believe this will be the Webs Premier site probably rank Pr8 :) easily in it's field if this was done,if it is already not,and fans such as Ban Proxies etc could moderate if they so choose.
Just tossing ideas.
What say you Bill?

SEO blog said...

IncrediBILL

if a competitor real want to screw up your web site and compare with your previous version as well as analyze your changes to take the best effort and replicate on his web site, certainly doesn't use instruments like xenu or archive.

In all case, those you described are really good tecnniques to limit bandwidth usage and archiving from non authorized tools.

Anonymous said...

Bandwidth usage costs you money. People archiving your stuff doesn't unless your business model is stupid, and actually nets you money if your business model is smart.

I'm not an SEO type either. SEO is fairly silly. Focus on providing a good product, earn a reputable name, and protect from trademark infringement, and customers will beat a path to your site. The more time, energy, and money you spend on making your site less compatible and harder for people to access instead of on improving your products and services, though, the worse your odds become.

This isn't a trick by a "black hat". It's just simple business sense, and it's echoed by sites like techdirt.com and useit.com that are surely not suborned.

Protectyourcontent said...

"if a competitor real want to screw up your web site ...blah blah....certainly doesn't use instruments like xenu or archive."

Incredibill is spot on black hats often use the original text from authority sites to destabilise there listings internate archieve is one of there main sources.

Blocking the USER AGENT yes I see where your comming from incredibill but I have found that ia_archiver actually obeys robots.txt and with the added advantage that if you block it in robots.txt rather than the user agent it will prevent all archieved content from beeing viewed.

Protectyourcontent said...

"The more time, energy, and money you spend on making your site less compatible and harder for people to access instead of on improving your products and services, though, the worse your odds become.
This isn't a trick by a "black hat". It's just business sense "

This isn't a trick either I am gonna allow people to keep there sites completely accessible and protect against scum that scrape and hijack there competitors and also email harvesters, its not a trick its just clausewitz business. If someone goes after you, don't get even destroy there intire industry.

Anyway incredibill you still want me to set up the test?

Actually can do it on any ip now thanks to some points you brought up. And yes black hats it will go open source just as soon as I can confirm its good enough.

IncrediBILL said...

"People archiving your stuff doesn't unless your business model is stupid, and actually nets you money"

That's the second stupidest shit anyone has posted on this thread.

What you miss is the archiving gives others a point of access to your content, even old content, and some people have been SUED over archived content!

Quick for instance was a guy that I know that got in trouble for using a logo for a product he didn't have authorization to sell, they sent him a C&D, he fixed his site, but still got sued because it existed in Archive.org.

It's just a bad idea to let your site get archived, for other reasons as well.

Protectyourcontent said...

"It's just a bad idea to let your site get archived, for other reasons as well."

Totally agree and to correct previous post you should use both robots.txt to block ia_archiver and the user agent.Anyway on that note here's the meta tag to stop archive.

meta name="robots" content="noarchive"

Ban Proxies said...

Protectyourcontent,

You need to protect your own content .

Server Response: Scumbag URL
HTTP Status Code: HTTP/1.1 200 OK
Date: Sun, 16 Mar 2008 14:46:18 GMT
Server: Apache/2.0.52 (Red Hat)
X-Powered-By: PHP/5.2.0
Content-Length: 6644
Connection: close
Content-Type: text/html

I can use your own server against you to create millions of pages. Some SEO's and competitors will do almost anything to get ahead in the SEs.

Anonymous said...

"That's the second stupidest shit anyone has posted on this thread."

Not unless Mike Masnick over at Techdirt is stupid, and I have a sneaking suspicion that he isn't.

As for the lawsuit, well stupid lawsuits happen and this was clearly a stupid one. That lawsuit clearly belongs to the same category as the ones that prompt disposable coffee cups to have disclaimers and cautions about how the contents may be hot.

Protectyourcontent said...

Hi Ban Proxies

"You need to protect your own content ."

He He a challenge....

OK I uploaded a robots.txt with this in it.

User-agent: *
Disallow: /*?

Mick said...

This is what i use just for reference.I still get people looking up my cache,although it does not exist in the major 3

meta name="robots" content="index,follow,nocache,noarchive">

Mick said...

@protectyourcontent

Here is an a guy whom i check with every now and then.
If anyone can condense this into a smaller file it would be good.

http://www.aaronlogan.com/downloads/htaccess.php

Protectyourcontent said...

Hi Mick thanks for the link,

The trouble with lotts of systems I see is they rely on ip blocking and with broadband ips can contain large sets of people and I just hated blocking them.

So thats where I got to thinking what if ip's instead of beeing banned had three states of progress. New ip would be set to a default, then depending on behaviour could either become whitelisted or blacklisted ip or remain at default. And you served content based on that behaviour.

Whitelist would get all your hard worked at accessible code. Default would get code framed and as all modern screenreaders support framed content this would preserve accessibility but protection would surroud the content of the frame and any rule was broken the ip became blacklisted. You could even say a set of rules which said permanent temp blacklist.

Now here's the thing instead of banning blacklist gets served content through ajax and needs javascript enabled. Otherwise they get nothing. You can even frame the javascript as at this stage accessibility has gone out the window and you are preserving the site for other users on that ip.

There would in a system like this never need to be a reason to ban a ip or user agent. User agents of se's (reverse dnsed of course) would be on the whitelist.

Would you like me to set up a demo Mick I think incredibill maybe lost interest.

IncrediBILL said...

@anon - your shouldn't be outing other anon posters and don't care who it was or how smart he thinks he is, that was some ignorant shit in the post without anything to back it up.

I agree with you the lawsuit was more or less stupid but the judge allowed it and the poor bastard spent a lot of money defending himself when blocking the internet archives was all it took to keep out of trouble in the first place.

Often the best way to avoid problems are to avoid the places where those problems happen in the first place, ever hear of "stay out of trouble"?

Replace the word "trouble" with Archive.org and you've got it.

Protectyourcontent said...

Incredibill look I have know hijacking was possible for years, I have known canniocal issues were there for years. The problem is not "stay out of trouble" its the fact that as a webmaster I had the ethos that it was morally wrong to use these issues to atack another website.

This is a whole new internet where its considered natural to attack a competitor, part of the job.Since posting here I have had two dns attacks on my server, why because there scared of what I might say next, and they should be.

This right now has become an internet that blackhats think they can control with spam and in this climate who in there right mind would invest in a website or webbusiness ever again. Pretty soon the web will be as full as spam sites as email is.By the way clients don't think your safe if your with a blackhat there already attacking each other.

Are we going to sit back and let them destroy the internet in the same way they destroyed email? I say no! its time to fight back. And to do it we need to unite get organised share our knowledge and have a place we can talk privately.

Mick said...

@protectyourcontent.

Thanks for the offer mate i am flattered,but simply i am just a baby in bathwater when it comes to this kind of programming and codes etc.

Thus most likely would be a bad candidate/value to bounce of and feed ideas to or get feed back...Thanks anyway.
I suggest maybe if Bill is busy or taking in what your writing to act later ..that perhaps Banproxies can team up with you..or even get in touch with the guys link i gave above..for experiment team up.
Best regards.
P.s But do by all means keep us in picture if you so wish as a board group.

Protectyourcontent said...

Jesus you work your heart out trying to design a system that completely eliminates black hat seo but protects accessibility and seems like no one is that interested........

Maybe I should just forget it.

IncrediBILL said...

Who said we weren't interested?

Post a temporary beta site where we can take a look and take it down in a couple of days.

See, I went down the javascript path and there's only so far a large site such as Target or Walmart can go into javascript before getting their asses sued by the visually impaired for accessibility issues.

For instance, you can make a form SPAM FREE simply by converting it into obfuscated javascript but the screen readers the blind use can't read the form unless it's in the raw HTML in a raw HTML format.

However, you can add javascript events to the form such as keeping track if keys were pressed and forwarding that data along so you can easily detect if a human typed vs. a bot post.

Like I said before, I don't doubt what you can do with javascript, went down some of that path myself, and I'm willing to take a peek, but too much javascript can literally make a site illegal if you're using it in the ways I suspect, especially in the EU.

Anonymous said...

Hey, fuckhead.

Yeah, you -- fuckhead.

I posted the other comment too. I wasn't outing anyone. Mike Masnick writes a lot of stuff over at Techdirt including the article that I linked to in the first of those two comments. I agree with him. He disagrees with you. A bunch of smart economists also agree with him, and disagree with you.

Perhaps you should take a break from blogging until you've taken and passed an Econ 101 course somewhere.

HTH, HAND, and all that.

IncrediBILL said...

Who are all those smart economists anyway, Friedrich Engels and Karl Marx?

We've all had enough of your commie rants about ending copyrights because most of the adult world, aka the REAL WORLD, makes a living off their copyrighted intellectual property so grow up and shut the fuck up already.

Victor said...

yeah! and then lets block "view source" from the browser, hell lets trap the right-click event altogether. why should you let people right-click your site anyway? i say we block firefox too, people can look at your CSS! OMFG!

Kinderfeestjes said...

Thanks! Blocking archive.org is a good one. A lot of black hat SEO's steal you old (unique) content using archive.org.

Anonymous said...

"Who are all those smart economists anyway, Friedrich Engels and Karl Marx?

We've all had enough of your commie rants"[snip!]

Commie? I think we have a case of the pot calling the kettle pink, here. The only commies in the room are the ones advocating state-granted monopolies and state intervention in the free market in the form of so-called "intellectual property". The ones advocating the limitation of mine and others' *actual* property rights in our computer hardware, in books we've bought, and in other things by telling us what we can and can't do with them after we've paid for them. And so forth.

As for blocking right clicking and "view source", that's easy to defeat. The browser has to see the source to render it, and a browser can always be made that will show it directly to a human, that will ignore any code telling it not to and fool the site into thinking it's not, and so forth. Block Firefox? Firefox will get disguised as IE. Block right clicks? It's easy to override that in Firefox without turning off JS.

As for that last comment, well, I don't think I'll even bother dignifying any more ridiculous posts that use the nonsense phrase "steal content" with a detailed response.

IncrediBILL said...

Oh look, if it isn't our resident troll coming back yet again to see if someone will give him/her/it validation about the copyright-free fantasy land they live in.

Yawn.

Sorry, I won't validate your endlessly drooling babble about copyright anymore.

Here's $0.25, go find a pay phone and call someone that gives a shit.

Anonymous said...

Fantasy land, my left nostril. Read Techdirt. Read almost any other blog. The winds of change are beginning to blow. You can batten down, or get out of the way, or learn the true meaning of "Category 5"; your choice. :)

Protect Your Content said...

I agree the winds are changing just not in your direction. The trouble is with people like you is you have no invention. Thats why you will always loose..........

Protect Your Content said...

Letts be honest Anonymous you are to scared to evan say your name. What are you nothing! a parasite, nah you are not even that important. You do nothing influence nothing are nothing why are you even part of this conversation your just a something that never happened now piss off.

Protect Your Content said...

I dunno if the Americans will fully get this but we have a saying in london that fully says it all Anonymous.........

"I wouldn't piss on you if you were on fire"

Protect Your Content said...

I want to know just how damaging to your networks it will be when we put through anti competitive legeislation that will be upheld by eec law that says only a manufacturer is allowed to use there brand in a domain name.

Protect Your Content said...

And let me say this loans your fucked on you will give us your leads for nothing as the credit crunch bites and you become desperate for business. My industry well I will totally destroy you take out every trick you have starting with pretending to be the manufacturer.

Protect Your Content said...

Now we are in this instance going to use my favourite brand Apple. If people were allowed to register domains such as "appleipod.co.uk" it would be safe to assume any consumer would be mislead to thinking this domain was owned by apple. A search result that said "domain.co.uk/appleipod.html" would not mislead the user in the same way.