Stopping scrapers appears to be like peeling an onion in that when you peel away one layer of bad bot activity you unearth yet another whole new layer that you couldn't see before. You start with user agent filtering then put up speed bumps and honeypots to stop others and even profile other behavior to stop them and they still keep coming. Now we're several layers into this scraper onion all sorts of new things are showing up that require more sophisticated methods to detect and block as they're hiding as browsers, running low key crawls, but still obvious to anyone that it's not a human if you look at the pattern of access.
To help thwart more of this nonsense the latest tool added to the bot blocking arsensal is reverse DNS lookups to see where the IP originates from and blocking or challenging bad sources from the start. There have been a few trends to become very obvious in that many scraper IPs that were auto-blocked didn't resolve to a domain name whatsoever or come from some suspicious hosting farms, the two most notable and persistent ones have been in Taiwan and the UK.
Now they'll need to find yet another way to get around the bot blocker as the newly installed steel door now has a chain, 2 deadbolts and a mean as piss rottweiler waiting on the other side just in case.
Stay tuned for more on the next episode of As the Onion Peels.
Monday, February 27, 2006
Peeling the Scraper Onion with Reverse DNS
Posted by IncrediBILL at 2/27/2006 10:13:00 AM
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment