Sunday, March 19, 2006

Educating the Public About Scraping

After talking to a lot of people lately, many webmasters and Silicon Valley internet savvy types, it has become obvious that they simply are oblivious to the entire problem with rogue bots and scrapers. Most people I've been discussing this with are aware of crawlers and they're aware of things like robots.txt, but completely in the dark about what goes on bypassing so-called standards. Ultimately, they leave the conversation with a new level of fear about the security of their online content and run to the nearest console and start searching for unauthorized usage of their content which, as we all know, they typically find without too much trouble.

The real eye-opener for most that aren't building sites that thrive off Webmaster Welfare™ (aka AdSense) seems to be the entire AdSense economy that fuels the bottom-feeding scraper sub-culture that Google has unwittingly created. Once they understand the motivation not only is it clear why scrapers scrape to anyone, but many wonder why they didn't think of it first! Then it's obvious that the low hanging fruit has universal appeal and everything on the net is fair game for the unethical types that pluck that fruit at any cost.

So the question remains, after this small sampling of industry savvy folks, is how wide is the blissful ignorance to this pandemic?

Wonder how many people learn something about this for the first time hitting this web site and just think I'm a paranoid loon with a tinfoil hat dancing with a flute celebrating the summer solstice?

I'm suspecting the depth of the problem is not known by most, even by webmasters fighting one-off copyright infringement, those that even have a hint think it's being overblown and from what I'm seeing in the last week in my banned log files, it will get a lot worse before it gets better.

3 comments:

UnreferencedVariable said...

What is the best way to find sites that have scraped your content? I have tried searching sentences from my site, is that the best way or do you have some tips and tricks?

Rob Clark said...

A good site that will check automatically is http://www.copyscape.com

UnreferencedVariable said...

Cool, thanks Rob.