Friday, March 28, 2008

REBI-Shoveler Digging for Korean Search Engine

REBI-Shoveler must be easily overlooked as it's very unusual to go to a search engine and type in the user agent and get no authoritative hit from any bot hunter whatsoever. There were tons of hits from various web stat pages but nothing I could easily find that gave me any clue what in the hell this thing was.

With this little information all I knew was it came from Korea, otherwise I was stumped: "REBI-Shoveler v0.1"
Finally I decided to see if I could find any more clues in the several years of bot tracking archive files I keep and sure enough, there was a single original hit on my server that contained the answer I was looking for:
"REBI-Shoveler/RS Ver. -100.0 (REBI's great worker ... ;;"
This bot operates out of multiple IPs in the range of 116.122.36.* and here's a little translation for you from their site about REBI, but not mention about robots.txt nor did it ask for the file when it visited my site today, so it's behaving badly.

Now you know who REBI is that's shoveling shit off of your server.



Ban Proxies said...

I see your bot and raise one.

CentiverseBot, IP=

I've never seen that UA before and the IP haven't been a known problem. - - [06/Nov/2007:07:44:26 +0000] "GET / HTTP/1.1" 200 19966 - - [29/Mar/2008:02:22:37 +0000] "GET / HTTP/1.1" 200 19568 - - [29/Mar/2008:13:56:47 +0000] "GET / HTTP/1.1" 403 897

IncrediBILL said...

It uses this as well: "CentiverseBot - investigator"

I've also seen something called "The Centiverse Project - Spider/1.1 Beta"

No clue what it's used for at this time so block away ;)

Doug said...

Good information Bill thanks! Do you have a full list of bots that should be blocked? I can add them to my firestats list. I looked at your robots.txt and it looks like you have blocked everything but Google. I'm using Config Server firewall on my cPanel server and it works like a charm for blocking intrusions but does nothing for bots unless you know the IPs and then set them up in iptables to deny.

Claus said...

The bot is a small project of mine. Normally the bot is not allowed to crawl other than: .dk . no. se .fi top domains – but I needed the stress test it. This is why you got visit.

I don’t use the downloaded pages for anything unless you can write Danish, Swedish, or other Nordic languages.

The purpose of the project is to create a Nordic semantic search engine - to begin with :)