Saturday, December 31, 2005

Slow Scrape Via AOL?

Something eluded to in a previous post, or maybe on someone else's site as it's all becoming one long blur, but it appears there is slow crawling running across an entire block of 256 IP addresses.

What seems to be happening is this scraping is coming from a couple of sources and one of them is someone using AOL as the IP resolved to an AOL proxy cache server. The implications are fairly disturbing in that blocking the scraper might also block a bunch of AOLers.

Many aren't aware that AOLers get issued a new IP address while surfing on the internet, typically around every 15 minutes, so if that person isn't using cookies they are quite difficult to track when the IP changes.

It could be multiple visitors but probably not due to the sequential nature of the pages being accessed and a couple of other factors that won't be mentioned so that the scrapers can't fake the behavior being targeted.

So now comes the ultimate question, as this activity is quite obviously scraping, to block or not to block, THAT is the question!

If a block is put in place it would obviously have to be some adaptive technology to analyze and temporarily block and only those IPs being used until the suspicious activity subsides.

Just what I need, to start out '06 chasing uber-scrapers.

Happy New Year scrapers, it might be your last.


Anonymous said...

Wouldn't a C block actually contain 65536 IPs?

IncrediBILL said...

Maybe I got the term wrong, it was late, dunno, but I've purchased "1 full class-C" from my colo before and they always referred to it as a c-block.

IncrediBILL said...

The heck with it, since I'm not 100% sure I'm using the term right I editted the damn thing out.