Started noticing some leeched content showing up on a site called GlueText so it got my curiosity up to see how they were gathering their content.
Turns out initially they were using the default libwww-perl user agent back in '09
188.8.131.52 "libwww-perl/5.820"Looks like they got a little smarter after being bounced by sites to switch to the old Netscape Navigator user agent for the Win 98 version which they still use today!
184.108.40.206 "Mozilla/4.76 [en] (Win98; U)"GlueText appears to have historically used the following IPs:
220.127.116.11My most current test showed they were now using the following IPs:
These IPs were from cloud-ips.com, all from GlueText:18.104.22.168
Other IPs still involved:Doesn't request robots.txt, fakes a Netscape user agent to gain access without permission, doesn't appear to document how it crawls content nor does it appear to give webmasters any way to opt-out.22.214.171.124 -> TOROON63-1279381340.sdsl.bell.ca
126.96.36.199 -> CPE0024b2cbf30a-CM0016b536fb82.cpe.net.cable.rogers.com