I keep seeing this crawler for Houxou:
18.104.22.168 "HouxouCrawler/Nutch-0.8.2-dev (houxou.com's nutch-based crawler which serves special interest on-line communities; http://www.houxou.com/crawler; crawler at houxou dot com)"When you go to their link http://www.houxou.com/crawler it doesn't say anything about the crawler, it just shows you their homepage. I'm not sure what special interest on-line communities you can possible be serving when you can't even post the page your user agent links claim to be on your website.
Before I gave up altogether, I decided to see what I could come up with in Google and found some interesting results but the site appears to be down.
Nutch: search resultsSo what's the deal?
help. Hits 1-9 (out of about 9 total matching pages): WHOIS - 22.214.171.124 ... 20030922 source: RIPE person: Monu Ogbe address: 15 Penman Close, ...
nutch6.houxou.com:8080/search.jsp?query=ogbe&hitsPerPage=10 - 10k - Supplemental Result - Cached - Similar pages
Nutch: 搜索帮助 - [ Translate this page ]
搜索英文单词不区分大小写, 因此搜索NuTcH 等同于搜索nUtCh. ... 评分详解)显示Nutch如何给该网页打分. (anchors)显示指向该网页而被Nutch索引的anchor文本. ...
nutch6.houxou.com:8080/zh/help.html - 7k - Supplemental Result - Cached - Similar pages
Why is Houxou crawling with a link to a missing page about bots?
Is this just a ploy to get webmasters trying to figure out what the Houxou crawler is to look at their hosting services?
Who knows, guess we'll just have to wait and see but it smells fishy to me.