Just to make sure our list doesn't grow stale, here's the new Nutch of the day:
18.104.22.168No clue why a business email company needs a web crawler but that's where it came from.
"NutchCVS/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html; email@example.com)"
I noticed the Nutch developers picked up on my previous post and are discussing forcing the default user agent to be changed, which wasn't yet again, and ways to reduce the amount of actual crawling of individual websites by Nutch.
Good luck on that effort guys, we can use it!