Saturday, November 04, 2006

Hunting PicScout, the Copyright Crawler Getty Uses

Everyone knows about PicScout used by Getty Images but nobody seems to know anything about PicScout's crawler, no user agent information, no IP's where they crawl from, nothing. When someone asked me if I knew anything about them I did a little research and nothing related could be found ANYWHERE, not even anything initially obvious in my bot blocker log files. Based on my initial observations PicScout actually seemed to be hiding better than all the other corporate crawlers I've researched to date, but maybe we can shed some light on this.

Not that I advocate copyright violation, as a matter of fact, I'm a staunch copyright defender.

However, attempting to crawl under the radar, refusal to honor robots.txt files, or identify your bot in any fashion and bypass website security measures gets under my skin more than anything so I picked up the gauntlet and tried to find signs of PicScout activity.

After the usual simple research methods failed, I decided to start by seeing where they were hosted.

host picscout.com
picscout.com has address 82.80.254.37

host 82.80.254.37
37.254.80.82.in-addr.arpa domain name pointer bzq-80-254-37.dcenter.bezeqint.net.
Ah ha!

I remember a rash of activity I shut down from bezeqint.net a while back so I looked a little deeper into this angle.
inetnum: 82.80.248.0 - 82.80.255.255
netname: BEZEQINT-HOSTING
descr: BEZEQINT-HOSTING
country: IL
Ah yes, they're the guys from Israel that were hammering one of my servers.

I found a high volume of crawling from these IP's that was trapped by the bot blocker automatically and never answered the challenges, so it was definitely bot traffic.
82.80.249.195
82.80.249.196
82.80.249.197
82.80.249.201
82.80.249.202
82.80.249.203
82.80.249.204
82.80.252.130
These IPs have only been spotted using the two following user agents:
Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; (R1 1.1); .NET CLR 1.1.4322)
My theory is that this is PicScount attempting to crawl under the radar.

Check your logs people, see if you have any activity in this range, I think it's them.

I would just block this range out of principle at this point as those IPs crawling aren't honoring any internet standards, and if it is PicScout, blocking them could possibly save you a massive chunk of money if some web designer used stolen images building your website.

UPDATE:

After posting this the fine people from PicScout visited the blog and revealed more information about their facilities.

The log showed this visit:
Host Name mail.picscout.com
IP Address 62.0.8.2
Country Israel
ISP Nv-picscout
The information I found from that, including another IP block is here:
inetnum: 62.0.8.0 - 62.0.8.255
netname: NV-PICSCOUT
descr: NV-PICSCOUT
country: IL
admin-c: OG570-RIPE
tech-c: NN105-RIPE
status: ASSIGNED PA
mnt-by: NV-MNT-RIPE
mnt-lower: NV-MNT-RIPE
source: RIPE # Filtered
So, there's a few more IPs you might want to block, but I doubt they're scanning from the office.

UPDATE: Caught Getty keeping an eye on everyone today.

My blog log showed this:
Time: 12th June 200712:24:53 PM
Host Name outbound.gettyimages.com
IP Address 206.28.72.1
Country United States
Region Washington
City Seattle
ISP Getty Images
Referrer: http://www.webproworld.com/graphics-design-discussion-forum/56384-invoiced-getty-images-unlawful-use-images.html

It appears they were snooping on WebProWorld and followed the link here. The user agent claimed to be MSIE 6.0 but it's possibly an automated crawler, hard to say.

Anyway, we're watching you watch us, it works both ways.

37 comments:

nanyo said...

HTML IMG IMG-JS COOKIE
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 84.108.136.230 bzq-84-108-136-230.cablep.bezeqint.net [yes] NO NO NO 20061012045626
libwww-perl/5.803 84.108.136.230 bzq-84-108-136-230.cablep.bezeqint.net [yes] NO NO NO 20061012045633
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) 84.109.26.147 bzq-84-109-26-147.red.bezeqint.net [yes] YES YES YES 20061014205045
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProducts) 84.109.29.35 bzq-84-109-29-35.red.bezeqint.net [yes] YES YES YES 20061019214921
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050223 Firefox/1.0.1 84.110.210.157 bzq-84-110-210-157.red.bezeqint.net [???] NO NO NO 20060913224304
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 88.152.42.149 bzq-88-152-42-149.red.bezeqint.net [yes] YES YES YES 20061019074952
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Crazy Browser 2.0.0 Beta 1; .NET CLR 1.1.4322) 88.153.135.121 bzq-88-153-135-121.red.bezeqint.net [yes] YES YES NO 20061103173056
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 88.153.145.151 bzq-88-153-145-151.red.bezeqint.net [yes] YES YES NO 20060930185251
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.1.4322) 88.153.145.151 bzq-88-153-145-151.red.bezeqint.net [yes] YES NO YES 20060930185251
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {2B0CCDE2-000D-9634-2794-328BA1B6DB4A}; Maxthon; .NET CLR 1.1.4322) 88.154.251.249 bzq-88-154-251-249.red.bezeqint.net [???] NO NO NO 20060613071745
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Avant Browser) 212.25.124.145 bzq-25-124-145.cust.bezeqint.net [???] NO NO NO 20060905070458
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727) 212.179.4.78 bzq-179-4-78.cust.bezeqint.net [yes] YES YES NO 20061017072031



my nanyo script has captured this visits. only few visits can be suspect. the most are visits from normal users.

only 84.108.136.230 can be real suspect. it not load any related image, not execute js and not accept cookies.

88.153.145.151 can be a spammer script or similar or only a browser with high protection.

the others are human visits.

Anonymous said...

Hey Bill, you might also like to investigate and/or block:

192.114.146.0/23
192.115.184.0/21
212.25.64.0/19

just to be sure. They are also Bezequint associated IP blocks. All may be being used for Picscout, or other dubious purposes. You just never can tell with some of these IL netblocks and businesses.

skore said...

Bill - thanks again for your help with this - there is really no information out there about it.

Marshall Clark said...

I have a stock photo agency client that uses PicScout to hunt down unpaid usage of their images. The system is so good that a significant portion of their revenue comes directly from copyright infringement settlements.

My client knew nothing about the system and PicScout wouldn't give them any details on the specifics. These guys seem like savvy operators. It wouldn't surprise me if they spider from multiple IPs and domains.

Anonymous said...

Hey bill you might also want to investigate my ass?

If there's no site then there is nothing here to be spammed. If there is nothing here to be spammed then you have no problem ;)

Get off the internet.

IncrediBILL said...

Anonymous, that post isn't about spam but it's probably not your fault the educational system failed you and left you illiterate.

Anonymous said...

Like every week our Israel friends kindly visited our bot trap. Doing this they told us this week the following:

Bot trap visit No.1:
host: bzq-84-110-232-8.red.bezeqint.net
IP: 84.110.232.8

Bot trap visit No.2:
host: proxy.asianet.co.th
IP: 203.144.144.164

Bot trap visit No.3:
host: 24-247-250-183.dhcp.aldl.mi.charter.com
IP: 24.247.250.183

Bot trap visit No.4:
host: 218.154.47.153
IP: 218.154.47.153

Bot trap visit No.5:
host: bzq-84-110-252-195.red.bezeqint.net
IP: 84.110.252.195

Bot trap visit No.6:
host: bzq-84-110-240-249.red.bezeqint.net
IP: 84.110.240.249

Bot trap visit No.7:
host: bzq-84-110-248-198.red.bezeqint.net
IP: 84.110.248.198

Bot trap visit No.8:
host: bzq-84-110-246-219.red.bezeqint.net
IP: 84.110.246.219

Might be of interest.
And to the Isreal guys: "See you next week ... in our bot trap".

Anonymous said...

Forgot to mention: our Israel "let-me-in-your-bot-trap"-guys always use the user agent string:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Israeli Webmaster said...

Hello,
I'm an Israeli webmaster. I just wanted to let you know that PicScout carwling policies might be dubios, but bezeqint is one of the leading Israel ISPs, and blocking all traffic related to it will probably block lots of legitimate users as well. So unless your business really don't care about international visitors, I wouldn't just block the whole range, but only addresses I'm SURE are the bad crawler.

Anonymous said...

Bill,
I am a photographer. This type of discussion tends to get philosophical for those who are not making their leaving from licensing their art, but I think you have really chosen the wrong side of the copyright battle. The fact is that the type of "transparent web bots" you endorse are the ones that practically help to infringe my copyrights and damage my livelihood all the time. Crawlers such as the ones that are used by Getty and Corbis (picscout and digimarc) are the ones that help educate copyrights violators and teach them a lesson.

What you are doing in your blog, is similar to a boy that sits on the fence and helps the thieves know when the police are coming.

You should know better…

IncrediBILL said...

Scuse me?

I'm not endorsing "transparent web bots", I'm against them. It works both ways, the "transparent web bots" may be stealing your stuff but that doesn't make it right for other "transparent web bots" to spy on my site to see if I've been stealing.

Understand?

I'm fully aware of the issues facing photographers but that doesn't give some company the right to crawl my site without permission.

I pay for my own dedicated server and bandwidth and have the right to block anything that's automated that comes banging on my server.

It might be a stretch, but it's possible hiding as a human to gain unauthorized access to my web server falls under "Computer Hacking and Unauthorized Access Laws"

http://www.ncsl.org/programs/lis/CIP/hacklaw.htm

Wouldn't be to hard to make a case for trespassing as a real bot with an appropriately named user agent would be automatically blocked, therefore these bots are deliberately hiding to bypass security, which is trespassing.

Photographer said...

As a photographer, whose images are with both Getty and Corbis, I think you really have gone too far at this point. Our (professional photographers and stock agencies) entire live, we shoot images so people like you will come and try to destroy us? Don't you realize that we actually earn our money by licensing royalties? The fact that so many images are misused online is simply because #$@! people like you who promote copyright infringements on behalf of "the good and smart" webmasters. Get a life!

IncrediBILL said...

Pay attention Mr. Misinformed Photographer, I actually make a chunk of change from photography as well but the activities of Picscout might even violate US "Unauthorized Access" laws.

I never said they shouldn't do what they do, I said they should play by the rules and let people know they are being crawled and opt to be crawled or not.

Since when does is the commission of a potential cyber crime justified by claiming to protect copyright?

That's right up there with having a mass protest orgy to protect virginity.

Anonymous said...

hey bill - Im a developer with a reseller hosting account - how can I block this Picscout on any sites we do (they're all legit by the way but don't agree with the way they are carrying out their business)

Do you use the robots.txt file or have to make changes on the server/

And if they are server changes do you think I will be able to even though I have a reseller account?

Thanks in advance mate!

G

Photographer said...

Hello photographer,
I am a photographer too. But I am also a web server administrator and I believe Bill is definitely in the right for informing people of this access.
As for why, I could write a long paragraph -- but I see both sides. If you don't want your photos to be stolen, watermark it. Put copyright messages on it. Prevent people from downloading it. Then manually look for copyright violations on your own volition. Don't enlist in Picscout to steal considerable bandwidth and server resources when you could do it yourself.

Peter said...

Don't worry - I support you entirely.

Anonymous said...

Bill take a look at this blog

http://images-public-free.blogspot.com/

In 26 weeks time time I could remove that blog. By then (assuming 100 downloads per week) a conservative estimate would be 2600 users.

In 3 years time I re-instate the blog but this time declaring that the image cost is £2000 each. I could use my tracking software to locate some of the 2600 sites using it. I send them all demands for £5,000.
Some will have kept a screenshot of the original blog, maybe some would have checked out my credentials however I can guarantee that at least 2000 poor suckers will have no proof of where or how they got those images particulary after a time span of 3 years.


Now do you see the scam being operated by getty and Picscout!!!!!!!!!!

Anonymous said...

How about the fact that Getty is stealing from webmasters by absorbing their bandwidth?

IncrediBILL said...

I see them keeping an eye on this post all the time too..

Like today:

Tel Aviv, Tel Aviv, Israel
bzq-88-154-50-163.red.bezeqint.net (88.154.50.163)
www.google.co.il/search?hl=iw&q=mail.picscout.com&meta=

If it's not them, why would anyone else be tracking mail.picscout.com in Google which basically points straight to this page.

Anonymous said...

I have a friend who works in picscout, here in israel. i did not know their crawler was that sophisticated and that nosy.

I never had a strong opinion about what they do, and I do think that ignoring robots.txt should get under everyone's skin (even the ones who stand to benefit), but how do you propose they deal with sites that hide behind the txt to keep stealing "their" images? Does it really matter if they use a crawler or just 50 people sitting in a room, "browsing"?

(the guy's point about claiming royalties after 3 years is good, but there it won't matter if a human or a robot made the find)

guy

Anonymous said...

Rec'd a lovely invoice from Getty dated mid-December 2007.

Thanks to your blog post I found PicScout's IPs in my logs. They are still using the 82.80.249.* IPs.

Tave now been blocked server-wide via my firewall.

Just wanted to let others know the data is valid.

Anonymous said...

Picscout has a partner called NCS Recovery which follows their spiders into the telephone world. They call companies that have used images advertised as "free" and threaten them with legal action if they don't pay up. Charges claimed through Picscout and NCS are many times the going rate. For Getty Images, charges in our case were $1200 for an image sold at $49 on gettyimages.com. I'm all for protecting artists' income; but I'm not fond of attempted small time extortion.

Anonymous said...

I got a letter from Gettyimages and NCS Recovery regarding two picutes. They are asking $2500 for te two pictures and they are selling the pictures online for 150 each. I offered them to buy the pictures and pay the same amount for the infringement and they say NO.....so then, are they really protecting photographers or trying to scare people?

Anonymous said...

Well I got a letter from Gettyimages regarding two picutes. but what sucks is i got the pictures back when it was free and there both low resolution photos ! now there coming back "3 years later"on good people for big cash.

Anonymous said...

Has anyone actually paid them? Better yet, has anyone actually gone to court over this? We got one of theses NCS letters as well; the images were used briefly and we took them down during a redesign and well prior to receiving any notification from either Getty directly or via 3rd party. We had no idea the images were copyrighted and obtained them from sources deemed to be reliable (and now since out-of-business) but as all of know; ignorance is no excuse for breaking the law son...

We went to the Getty website to see what it would cost to acquire the licenses and it was less then 10% of they were claiming in their "demand" letter. I'm not a lawyer; but in my opinion it seems like extortion.

Pay up or we'll take legal action. Seriously; if they asked for the amount in which the licenses actually costs vs. the inflated amount that they are claiming in their invoice; we probably would have just cut the check already...Think we're gonna roll the dice and ignore them; but still would like to know if anyone has actually paid them.

Dave said...

Dear Picscout defenders:

I'm all for takedown notices a la the DMCA, followed by a legal bill if the infringement recurs.

Most users have found their images through some bad clip art library or through google images. Watermarking your images would do more to prevent piracy than encouraging users to infringe, then making up arbitrary prices after they infringe. Most of the time the infringers are innocent parties who have been burned by an unscrupulous web designer.

OWG said...

The problem here is that getty/Corbis/piscout are doing the equivalent of breaking into someones house to see if any of the property in the house belongs to them. It is Illegal what they are doing.

here is the thing. my server is MY SERVER. If I am running test sites for clients to evaluate, and i am using getty positionals, then I have a LEGAL RIGHT to do so. you can NOT breach copyright by looking at an image.

The process would be
1. go to istock get a selection of images in the lightbox. Client makes selections, we positional them, client says yes, we buy a licence, client says no, we don't buy it and it is discarded.

Now imagine the spider arives during this process!

Robots exclusion protocol is there for a reason. people pay for bandwidth, picscout is illegally using bandwidth the INSTANT it ignores robots exclusion protocol, and therefore they are commiting theft!

Anonymous said...

I was reading through this quite interesting blog...after my company was caught "infringing" (unintentionally). This was quite some time ago, however, we hired a designer and paid them an exorbitant amount of money to design a nice site for our company.

While I am sure you would love to know, I would rather keep our company name private. You may recognize the name and I'm sure our President would frown upon the spill of information.

Our company was sued by a photographer that Getty represents. We were ordered to pay just over $12k for two images! Please don't ask all of the legalities involved, because honestly-I don't know. Rumor has it, we were even able to provide proof of our agreement with the web designer.

From what I understand, Getty Images didn't initiate litigation since they don't own the copyright registration and it was completely up to the photographer (who did).

I guess this is a huge deterrent factor and will prevent people from using websites to promote their companies, just out of fear. It's rediculous.

Anonymous said...

I found this thread more then interesting,

To the Israeli web master I say, Yes, as a web designer and hosting company, no doubt we will block IP segments, even if it's a whole country or continent, if we get harassed and extortionate by over copywriters like Getty and Corbis.

They simply found and exploit a simple way to bully and extort money from innocent peoples by threatening them.

To the photographer I say, If you need to hunt people over the internet to survive from your "art" you must be one damn real bad photographer and should change to something like hair dressing or window washing... I am a photographer too and I don't even need internet to survive... pathetic...

No companies or institutions are above the law. I am crawling the internet since one of my customer told me they have been bullied by Getty. This is frivolous extortion, hard to prove legitimate copyright, expired noticed with expired discount on it.

Just browse a little bit, you'll find out that this is completely insane.

Anonymous said...

I would like to add to my previous comment the following,

1. Getty is also the owner of iStockPhoto (a royalty-free graphic resources website). If you go there, you'll see how well made are the watermarks on every single pictures.

2. Now go back on Getty and observe the poorness and inefficiency of the watermarks on the images.

3. The is so many technics and software available on the market that can be used to watermark multiple images at the same time, that there's no excuses whatsoever for not improving the watermark used by gettyimages.

What is this all about ? ENTRAPMENT !!

Many of Getty pictures are getting on Google Images and make them so easy to use.

My believe is that it's a setup well orchestrate to extort more money out of the pocket of ordinary people.

My best council, is not to pay them. If you have to pay something to someone, better be your lawyer and for a fraction of the price, he will get you out of this in one letter.

So far I have seen only one anonymous post pretending that they've received a Federal court summon. No more details... Many layers watching closely Getty have reported no action taken from them to sue any people.

The last time I've seen such thing was performed by the mafia collecting for protection...

Anonymous said...

Well this guy made a website out of this extortion by getty images
http://extortionletterinfo.com/

There's a podcast where he interviews a copyright lawyer on this issue

Anonymous said...

Getty images has an interesting Website Usage Terms and Conditions. All webmasters are encouraged to use their own weapons. You can forbid humans and robots to access your website for purposes you don't grant (typically customers and real search engines). If I understood their terms and conditions, Google is forbidden to spider their website. How can we inform Google that they should not spider Getty Images, therefore deleting them from their SERP? I guess that Getty has just bought a rope to hang itself!

Anonymous said...

I am a professional designer, photographer and small business owner. One of my larger clients was just served a 'cease & desist' letter from Getty for a rights-managed image I used AS A PLACEHOLDER on the site that the client was supposed to provide a replacement for. They never did, and I completely forgot about that image. That particular image is over TEN YEARS old and it's not even a great shot (hence the use of it as a placeholder). To my knowledge they did not demand any payment from my client, just to remove the images, which we did immediately. HOWEVER....after spending the better part of my day researching this subject, I'm shocked, appalled and now extremely paranoid about the 1,000+ other websites I've designed over the past decade!?!? I always purchase all of my stock photography legitimately, but do I keep records of those purchases from 7..9...10 years ago?? Of course not. Do I keep records of stock purchases made by my clients who then provided the images to me to use in their designs? Of course not. So you can see that my paranoia is justified!

I run several of my own servers, so you can absolutely BET that I'll be blocking picscouts bots and possibly all traffic from Israel. I don't care what anyone thinks, this is total Big Brother B.S.

Oh, and I totally agree with the poster who talked about photographers getting all high and mighty about their 'art being stolen' Ummm, if you're that desperate to track down potentially lost revenue in this method, then you're clearly clinging to a failed business model, or your work just plain sucks. This is an invasion of my privacy and ethically goes against everything I stand for.

Anonymous said...

Looking forward to joining PicScout and hope the stay innovative enough to avoid your primitive attempts to block them. If your server is online with the world wide web, expect to have your site checked for stolen images - plain and simple

IncrediBILL said...

Picscout has no business on my site without permission so go join them and then go fuck yourself.

Anonymous said...

Hello Bill - do you have a update list of ips to block, I also dont want them to use up all my bandwidth and >I just dont like companies like that.

Anonymous said...

I'm blocking Picscout IPs not because I have anything against Getty or Picscout, but because unlike all search crawlers etc., Picscout does not obey the Crawl-delay rule. I'm using, not by choice, a badly designed PHP software that gets really sluggish when it gets hammered by Picscout trying to download all images at once (and all dynamic resizes of every image).

I'll drop the blocks if Picscout starts to obey at least reasonable Crawl-delay rules (up to 10 seconds or something).