There has been past speculation that someone is hiding behind AOL's proxy servers doing scraping and tonight I just happened to catch it live and decided to try something.
The pages were downloading at a slow clip via AOL with a user agent like this:
Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1; SV1; .NET CLR)The minute I blocked them they tried about 10 more URIs with that user agent string and it suddenly changed to:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR ; .NET CLR ; Something I removed)To me this proves there is someone cloaking under the bank of rotating AOL IP addresses and they just happened to download enough to catch my attention this time and tried a non-AOL browser once I stopped them.
Time to implement a little more sophistication in my bot blocker.
Sorry slick, you aren't.
BUSTED!
2 comments:
If my thought process isn't failing miserably, wouldn't this indicate a scraper that wasn't a script/bot but instead a Windows application that simulated a spider?
Whether it's an application or a script/bot is irrelevant really as either can usually be modified to mimic some user agent.
The only things I know for certain is:
a) pages were coming at a slow but steady clip
b) IP address was definitely an AOL proxy cache server
c) when blocked appeared to transition to a human with a browser for about 5 pages before giving up
Post a Comment