Saturday, March 29, 2008

WHO is Scraping My Site!

Note the lack of a question mark in the title because this wasn't a question about "WHO?" but an actual statement about "WHO!" and by that I mean the WHO as in an office of the World Health Organization.

It registered 411 page requests from which is a non-portable address assigned to the WHO Representative Office in Sri Lanka.

Here's the IP and UA:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
Here's the WHOIS:
inetnum: -
netname: WHO-SLT-LK
country: LK
descr: WHO Representative Office
descr: 385, Health Inform. Centre, Suwasiripaya, Deens Road, Colombo-10
admin-c: NS198-AP
tech-c: NS198-AP
mnt-by: MNT-SLT-LK
source: APNIC

person: Network Administrator SLTNet
nic-hdl: NS198-AP
address: Sri Lanka
country: LK
mnt-by: MNT-SLT-LK
source: APNIC
It pretended to be a human browser like so many of them do these days by pulling all the images from the index page and then it took off ripping pages like a bandit.

It wasn't even a smart bot as the first link it hit off the index page was my bot trap which is easily flagged and avoidable in the robots.txt as a no crawl zone, so it definitely wasn't human.

Of course the robots.txt file is my other bot trap but what the hell.

Then it went screaming along asking for the next 409 pages at 2-3 pages a second.

It would appear that WHO should check out the health of their computer network as something is rotten in their offices in Sri Lanka.


Ban Proxies said...

Alot of crap is coming from what should be "trusted sites".

Check this out: - - [31/Mar/2008:14:48:19 +0000] "POST /comment/reply/7489 HTTP/1.1" 403 - - [31/Mar/2008:14:48:21 +0000] "POST /comment/reply/7475 HTTP/1.1" 403 - - [31/Mar/2008:14:48:22 +0000] "POST /comment/reply/7461 HTTP/1.1" 403 - - [31/Mar/2008:14:48:22 +0000] "POST /comment/reply/7467 HTTP/1.1" 403 - - [31/Mar/2008:14:48:27 +0000] "POST /comment/reply/7477 HTTP/1.1" 403 - - [31/Mar/2008:14:48:29 +0000] "POST /comment/reply/7483 HTTP/1.1" 403 - - [31/Mar/2008:15:51:59 +0000] "POST /comment/reply/213 HTTP/1.1" 403 - - [31/Mar/2008:15:51:59 +0000] "POST /comment/reply/8721 HTTP/1.1" 403 - - [31/Mar/2008:15:51:59 +0000] "POST /comment/reply/4841 HTTP/1.1" 403

3 posts in a minute to 3 different urls ..... yeah right.

Maybe it's related to, 8,700 FTP Server Credentials in the Hands of Hackers.

Protectyourcontent said...

"Alot of crap is coming from what should be 'trusted sites'."

Totally agree!