tag:blogger.com,1999:blog-19248375.post7688818634924806864..comments2023-10-18T05:54:12.748-07:00Comments on IncrediBILL's Random Rants: Discovery Engine's Discobot Discovered My Bot BlockerIncrediBILLhttp://www.blogger.com/profile/14244934627308399202noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-19248375.post-7819490731803731822012-11-18T05:56:13.194-08:002012-11-18T05:56:13.194-08:00another year, and they're still here - changed...another year, and they're still here - changed their User Agent string though - now showing up as <br /><br />Mozilla/5.0 (compatible; discoverybot/2.0; +http://discoveryengine.com/discoverybot.html<br /><br />Web page hasn't changed. I can find no list of employees or officers, and the address shows up as the offices of SF Heat, an ad agency. not sure what to make of all that, other thanGlennhttps://www.blogger.com/profile/18211464970485630320noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-27996682433800605562012-01-05T14:41:20.965-08:002012-01-05T14:41:20.965-08:00They're still around and ignoring Disallow (bu...They're still around and ignoring Disallow (but respecting crawl limits) in robots.txt. I ended up having to 403 them by user-agent in my apache config earlier today.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19248375.post-74802644432792889792011-09-13T12:03:23.609-07:002011-09-13T12:03:23.609-07:00Anyone re-examine the Discovery Engine (or the Dis...Anyone re-examine the Discovery Engine (or the DiscoBot). Would love to know where they netted out.Garza Girlhttps://www.blogger.com/profile/09897592665069633905noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-16919310240747987172011-09-12T20:28:26.934-07:002011-09-12T20:28:26.934-07:00Howdy IncrediBILL,
Have you ever had said beer? ...Howdy IncrediBILL,<br /><br />Have you ever had said beer? Betting not. ;-)<br /><br />Have you heard anything new from these folks? They are in a new IP range and crawling my site again after a long pause.<br /><br />Maybe DiscoBill will come back and give us an update?<br /><br />Seems like a 3+ year closed alpha phase would be a little excessive.<br /><br />Either a world-changer, a Russellhttp://www.orphanstear.org/noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-23967004602627463382011-07-25T18:19:16.348-07:002011-07-25T18:19:16.348-07:003.5 years later, they still have the suspicious bl...3.5 years later, they still have the suspicious black page with gray text and are still introducing their "next generation search engine."<br />But are sporting a new IP: 38.101.148.126 with discobot/1.1.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19248375.post-43178345069462713502008-05-15T05:02:00.000-07:002008-05-15T05:02:00.000-07:00Bill Mydlowec. Are you folks really from Google an...Bill Mydlowec. Are you folks really from Google and Stanford? And did they forgot to teach you the basics of user friendliness? I could not read a single word on your website. Black background with some grey scribble on it, so it seemed to met? One, who is rather already skeptical if not downright suspicious of new crawlers would think it was deliberate!Anonymoushttps://www.blogger.com/profile/13105454536396962698noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-56073589052455257972008-04-30T06:51:00.000-07:002008-04-30T06:51:00.000-07:00Bill, I will take you up on the beer offer. FYI h...Bill, I will take you up on the beer offer. FYI here is an announcement we put out today:<BR/><BR/>http://www.discoveryengine.com/news/pr-alex.htmlAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-19248375.post-57743832021563262132008-04-14T01:27:00.000-07:002008-04-14T01:27:00.000-07:00Bill, I block all data centers because of the high...Bill, I block all data centers because of the high volume of scrapers, spammers and proxy sites hosted in said data centers.<BR/><BR/>Besides, my high volume sites are whitelisted only (except this blog) so nothing gets in the front gates if I don't want it to get in.<BR/><BR/>So riddle me this, why doesn't Discobot support full trip DNS verification like Google?<BR/><BR/>If you want a leg up on IncrediBILLhttps://www.blogger.com/profile/14244934627308399202noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-36159725755831278872008-04-13T22:47:00.000-07:002008-04-13T22:47:00.000-07:00Hello, I'm the CEO of Discovery Engine. I wanted t...Hello, I'm the CEO of Discovery Engine. I wanted to say that we are not a porn scraper or spam site! <BR/><BR/>Our company was founded by computer scientists from Stanford and Google. We are building a new web-scale search engine to be launched publicly later this year. <BR/><BR/>The discobot is downloading pages to help users of our beta service find your content. It is OK if you want to blockAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-19248375.post-28094521683448952592008-04-08T02:33:00.000-07:002008-04-08T02:33:00.000-07:00Dude, you're way behind in this game.Just search m...Dude, you're way behind in this game.<BR/><BR/>Just search my blog for "<A HREF="http://incredibill.blogspot.com/search?q=porn+scraper" REL="nofollow">porn scraper</A>" and see what pops up.IncrediBILLhttps://www.blogger.com/profile/14244934627308399202noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-53705675798343657792008-04-07T23:15:00.000-07:002008-04-07T23:15:00.000-07:00I am curious incredbill if you have ever tried to ...I am curious incredbill if you have ever tried to track back the source of these scrapers. I long suspected porn,loans and gambling and prescription drugs but was very surprised when initial scrapers on a new site came from ip's and useragents related to shopping sites.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19248375.post-41482417463559551552008-04-05T02:33:00.000-07:002008-04-05T02:33:00.000-07:00Thanks for the tip. Surprisingly, they weren't blo...Thanks for the tip. Surprisingly, they weren't blocked with their new user agent.<BR/><BR/>You might remember their previous user agent was <I>disco/Nutch-1.0-dev (experimental crawler; www.discoveryengi<BR/>ne.com; disco-crawl@discoveryengine.com)</I>.<BR/><BR/>That Nutch reference caused their requests to be assigned a lower priority score in my processing engine ;-)<BR/><BR/>FWIW, Anonymousnoreply@blogger.com