Tuesday, August 15, 2006

Multiple Scrape Attempts from Google IPs?

OK, anyone can shed any light on this would be nice, web accelerator may?

Had a batch of "Avant Browser" requests, none got answered because of this SNAFU request early on that tripped the bot trap, yet they just kept coming:

64.233.173.89 - "GET /#top" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser; Avant Browser; .NET CLR 1.0.3705)"
Google didn't even respond properly to reverse DNS, sloppy shit:
nslookup 64.233.173.89

** server can't find 89.173.233.64.in-addr.arpa: NXDOMAIN
But it's certainly a Google IP:
whois 64.233.173.89

OrgName: Google Inc.
OrgID: GOGL
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US

NetRange: 64.233.160.0 - 64.233.191.255
Then look at THIS one also from Google, what the hell?
72.14.194.19 - "GET /robots.txt" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6"
Same reverse DNS problem:
nslookup 72.14.194.19

Non-authoritative answer:

*** Can't find 19.194.14.72.in-addr.arpa.: No answer
Just to make sure it wasn't my servers, I checked DNSSTUFF.com, same result.

Yet, it's Google:
whois 72.14.194.19

OrgName: Google Inc.
OrgID: GOGL
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US

NetRange: 72.14.192.0 - 72.14.255.255
OK, someone from Google got a clue what in the hell is going on?

Anyone?

This is unacceptable whatever it is!

5 comments:

Anonymous said...

Morning,
Most likely these were requests through the Google accelerator. Since the accelerator is a transparent proxy, scripts that track Client-IP or X-Forwarded-For headers may be able to reveal the origin. I see these requests on a regular basis, but so far no abusive activities (luckily)

Olliver

IncrediBILL said...

Yup, I didn't bother looking in my "ultimate" log file last night and they are all definitely using a proxy at Google.

However there is no Proxy of VIA information, just the FORWARDED IP.

Sloppy and half-baked if you ask me.

Anonymous said...

I think that the ip addresses used for proxying should resolve to a meaningful hostname, so people can immediately recognise the machine's purpose.

IncrediBILL said...

I concur but my inquiry to Google has been met with strange silence ;)

Anonymous said...

Uhm, that doesn't necessarily have to mean anything. Probably they're working on their algorithm to get this problem sorted, rather than committing themselves to a blunt hand job (manual DNS configuration change) ;-)