Monday, January 16, 2006

Scraping From AOL Users Possibly Confirmed

There has been past speculation that someone is hiding behind AOL's proxy servers doing scraping and tonight I just happened to catch it live and decided to try something.

The pages were downloading at a slow clip via AOL with a user agent like this:

Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1; SV1; .NET CLR)
The minute I blocked them they tried about 10 more URIs with that user agent string and it suddenly changed to:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR ; .NET CLR ; Something I removed)
To me this proves there is someone cloaking under the bank of rotating AOL IP addresses and they just happened to download enough to catch my attention this time and tried a non-AOL browser once I stopped them.

Time to implement a little more sophistication in my bot blocker.

Sorry slick, you aren't.

BUSTED!

2 comments:

nicholas ward said...

If my thought process isn't failing miserably, wouldn't this indicate a scraper that wasn't a script/bot but instead a Windows application that simulated a spider?

IncrediBILL said...

Whether it's an application or a script/bot is irrelevant really as either can usually be modified to mimic some user agent.

The only things I know for certain is:

a) pages were coming at a slow but steady clip

b) IP address was definitely an AOL proxy cache server

c) when blocked appeared to transition to a human with a browser for about 5 pages before giving up