Friday, January 20, 2006

Firefox Scraping Giggle of the Day

Just had an amusing scrape attempt from something that claims to be Firefox with HTTP_REFERER set properly as it crawled and everything but it attempted to rip 299 pages in 211 seconds which set off the alarms instantly and automatically shut them down after only a fraction of the page requests at that speed.

Maybe there's a Firefox plugin that did this and if that's the case it's just getting Firefox users blocked but I don't care as this kind of behavior isn't welcome.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; xxxx) Gecko/xxxxx Firefox/1.0.7
I was amused watching it happen in real time but you lamo scrapers didn't really think it would work did you?

What my scraper blocker used to do was just stop counting page requests after it hit a specific threshold and temporarily disabled the scraper to stop them. However, my latest modification keeps counting continued attempts so after the initial threshold trigger blocks them the page counter just keeps going to see how many pages they really wanted. Eventually the scraper will set off a second level threshold trigger that gives them a permanent ban automatically if the total page requests are too extreme.

Pretty obvious when it just keeps going that there's no human at the controls.

Too funny - got anything better to throw at me?

3 comments:

nicholas ward said...

Are you truly extending a challenge here? I'd love to work on the other end of this, but don't have enough time. Now if we're developing a commerical application, that's another story.

IncrediBILL said...

Um, it was just a rhetorical question as the crawlers are so bad at times it's like a DOS attack already.

My main money making site has been hammered so many times today I'm getting all the 'help' I need on that end of the problem :)

nicholas ward said...

Think of it Bill -- we could have our cake and eat it too. Release staggering versions of a smart scraper and a scraper buster?

A new version every other week?

Hold the servers of thousands of unexpecting webmasters hostage?

You've got to see the possibilities here. Solid as a rock. Huge upside. It worked for all those spyware guys!