Started noticing some leeched content showing up on a site called GlueText so it got my curiosity up to see how they were gathering their content.
Turns out initially they were using the default libwww-perl user agent back in '09
99.231.221.217 "libwww-perl/5.820"Looks like they got a little smarter after being bounced by sites to switch to the old Netscape Navigator user agent for the Win 98 version which they still use today!
99.231.78.89 "Mozilla/4.76 [en] (Win98; U)"GlueText appears to have historically used the following IPs:
99.231.78.89My most current test showed they were now using the following IPs:
CPE0024b2cbf30a-CM0016b536fb82.cpe.net.cable.rogers.com.
173.203.215.230
173-203-215-230.static.cloud-ips.com.
99.231.221.217
CPE0009a30119af-CM0016b536fb82.cpe.net.cable.rogers.com.
99.231.44.115
CPE002436a0fbf3-CM0017ee4740ec.cpe.net.cable.rogers.com.
76.65.207.92
TOROON63-1279381340.sdsl.bell.ca.
These IPs were from cloud-ips.com, all from GlueText:173.203.210.51
173.203.210.95
173.203.215.230
173.203.241.192
Other IPs still involved:Doesn't request robots.txt, fakes a Netscape user agent to gain access without permission, doesn't appear to document how it crawls content nor does it appear to give webmasters any way to opt-out.76.65.207.92 -> TOROON63-1279381340.sdsl.bell.ca
99.231.78.89 -> CPE0024b2cbf30a-CM0016b536fb82.cpe.net.cable.rogers.com
BAD ROBOT!
Blocked.
2 comments:
Cool article keep up the good work. This site I will keep my eyes on in the future. Kim Denmark
So what name do we use for UA?
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl [OR]
Post a Comment