tag:blogger.com,1999:blog-19248375.post7688818634924806864..comments2008-05-15T05:02:16.647-07:00Comments on IncrediBILL's Random Rants: Discovery Engine's Discobot Discovered My Bot Bloc...IncrediBILLhttp://www.blogger.com/profile/14244934627308399202noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-19248375.post-43178345069462713502008-05-15T05:02:00.000-07:002008-05-15T05:02:00.000-07:00Bill Mydlowec. Are you folks really from Google an...Bill Mydlowec. Are you folks really from Google and Stanford? And did they forgot to teach you the basics of user friendliness? I could not read a single word on your website. Black background with some grey scribble on it, so it seemed to met? One, who is rather already skeptical if not downright suspicious of new crawlers would think it was deliberate!museAmusehttp://www.blogger.com/profile/13105454536396962698noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-56073589052455257972008-04-30T06:51:00.000-07:002008-04-30T06:51:00.000-07:00Bill, I will take you up on the beer offer. FYI h...Bill, I will take you up on the beer offer. FYI here is an announcement we put out today:<BR/><BR/>http://www.discoveryengine.com/news/pr-alex.htmlBillhttp://www.blogger.com/profile/11592109868499618157noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-57743832021563262132008-04-14T01:27:00.000-07:002008-04-14T01:27:00.000-07:00Bill, I block all data centers because of the high...Bill, I block all data centers because of the high volume of scrapers, spammers and proxy sites hosted in said data centers.<BR/><BR/>Besides, my high volume sites are whitelisted only (except this blog) so nothing gets in the front gates if I don't want it to get in.<BR/><BR/>So riddle me this, why doesn't Discobot support full trip DNS verification like Google?<BR/><BR/>If you want a leg up on IncrediBILLhttp://www.blogger.com/profile/14244934627308399202noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-36159725755831278872008-04-13T22:47:00.000-07:002008-04-13T22:47:00.000-07:00Hello, I'm the CEO of Discovery Engine. I wanted t...Hello, I'm the CEO of Discovery Engine. I wanted to say that we are not a porn scraper or spam site! <BR/><BR/>Our company was founded by computer scientists from Stanford and Google. We are building a new web-scale search engine to be launched publicly later this year. <BR/><BR/>The discobot is downloading pages to help users of our beta service find your content. It is OK if you want to blockBill Mydlowechttp://xenon.stanford.edu/~myd/noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-28094521683448952592008-04-08T02:33:00.000-07:002008-04-08T02:33:00.000-07:00Dude, you're way behind in this game.Just search m...Dude, you're way behind in this game.<BR/><BR/>Just search my blog for "<A HREF="http://incredibill.blogspot.com/search?q=porn+scraper" REL="nofollow">porn scraper</A>" and see what pops up.IncrediBILLhttp://www.blogger.com/profile/14244934627308399202noreply@blogger.comtag:blogger.com,1999:blog-19248375.post-53705675798343657792008-04-07T23:15:00.000-07:002008-04-07T23:15:00.000-07:00I am curious incredbill if you have ever tried to ...I am curious incredbill if you have ever tried to track back the source of these scrapers. I long suspected porn,loans and gambling and prescription drugs but was very surprised when initial scrapers on a new site came from ip's and useragents related to shopping sites.Protectyourcontenthttp://www.protectyourcontent.orgnoreply@blogger.comtag:blogger.com,1999:blog-19248375.post-41482417463559551552008-04-05T02:33:00.000-07:002008-04-05T02:33:00.000-07:00Thanks for the tip. Surprisingly, they weren't blo...Thanks for the tip. Surprisingly, they weren't blocked with their new user agent.<BR/><BR/>You might remember their previous user agent was <I>disco/Nutch-1.0-dev (experimental crawler; www.discoveryengi<BR/>ne.com; disco-crawl@discoveryengine.com)</I>.<BR/><BR/>That Nutch reference caused their requests to be assigned a lower priority score in my processing engine ;-)<BR/><BR/>FWIW, Johannhttp://johannburkard.denoreply@blogger.com