I have one CONTACT US form on a website that I leave less protected than other forms just to allow customers with their browser security dialed up tight to drop a line without getting caught in anti-spam snares.
Mind you, this page only sends an email to ME, nothing public, nothing nobody will ever see as I sure as hell won't look at the spam other to delete it, so it gives them ZERO value for their efforts, yet they persist.
So in the beginning there was a small trickle of spam on this form that started to escalate.
The first thing I did ages ago was I changed to the form to require a POST just to thwart them from their simple GET's dumping junk.
Eventually they switched to use a POST, but that means someone was monitoring response codes, but WHY?
The trickle of spam eventually came back.
So I changed a couple of fields just to alter the process and break their auto-spam tool.
A long nice quite period but obviously someone is watching and they adapted yet again.
Fine, so I made it a requirement that the page rejected the post unless they had accessed some other page on my site first, which would be a normal user thing.
This caused a longer period of blissful silence.
Then here comes the spam yet AGAIN!
OK, fine, let's try embedding something in the page unique per visitor so if you don't get the CONTACT US page first, and use that parameter, it will reject the submit.
This just blew my fucking mind when a few days later they adapted to first get the page, get all parameters from the form, then POST the page!
OK, now we know someone is fucking watching this page...
Fine.
I made a change that you can't see in the HTML, it's all server side, knock your fucking socks off trying to adapt this time.
I still don't see why the spammers would bother as they're just wasting time.
Nobody will ever see their spams, NEVER EVER, but I can play this cat and mouse game as long as they can.
All this trouble just because I didn't want to annoy visitors with a captcha on a single page, or require cookies or javascript to be enabled.
If they push me too hard the captcha gets installed.
FYI, I'm watching the someone trying to fix their form post to my site as I'm writing this. They've made about 10 attempts now and it's still not getting through. This must be making him nuts as I don't give them any clues why the submit isn't working except a generic error that the submit failed and please try again!
Let's see what happens next...
UPDATE: The spambots were hammering away at that forum trying to figure out what I did for days with literally hundreds of post attempts from a couple of IPs. Probably the spambot herder trying to figure out my latest anti-spam hack. Then it stopped, not a single POST from those sources and it's back to normal with only real posts from humans.
Thursday, August 09, 2007
CONTACT US Form Spammers Monitor Submit Results!
Posted by
IncrediBILL
at
8/09/2007 02:06:00 PM
11
comments
Sunday, August 05, 2007
Yahoo's RSS Feed Refresh is SLOW!
One of my sites has a dynamic RSS feed and it sends a refresh ping to Yahoo every time new content is added to the feed. Sometimes the content is added slowly over the course of the day, sometimes content is added more rapidly and new items are added to the feed almost back to back.
The code managing the feed is simple in that it simply updates the RSS feed and pings all the refresh services in real time as the data becomes available.
If you add more than one item in a minute or two what does Yahoo say?
Too soon for what?Refresh failed: Too soon http://www.mysite.com/myfeed.xml
Too soon for more new content?
Too soon for your crappy refresh servers to keep pace with reality.
Why don't you just queue it up because I've already told you that the content you previously had is already OUT OF DATE but noooooooo, it's TOO SOON to refresh because we're Yahoo and we have silly rules in place to protect our fragile servers.
Well guess what?
You need a new error called: "TOO LATE!" as your version of the feed is older than everyone else's that could keep up.
As a matter of fact I thought I'd try it ONE MORE TIME as I figured in the time it took to type this blog post that Yahoo would've allowed the RSS feed update by now so I manually pinged their server and you guessed it "TOO SOON! TOO SOON! WE'RE YAHOO AND WE CAN'T KEEP UP!"
Sheesh.
Posted by
IncrediBILL
at
8/05/2007 05:23:00 PM
2
comments
Tuesday, July 31, 2007
Attempted Distributed Scrape from SAIX.net
This is the kind of scrape attack I warn my bot blocking comrades in arms that they would probably miss because it's distributed over multiple IP addresses. Had the scraper not left the default user agent "Java/1.6.0_02" most of the anti-scrapers would be helpless against this type of scrape.
Here's a sample of the activity:
198.54.202.246 [ctb-cache7-vif1.saix.net.] requested 3 pages as "Java/1.6.0_02"This is a prime example of why standard bot blocking that only takes a single IP address would fail because these are all proxy servers that claim to be forwarding on behalf of 41.240.133.235 [dsl-240-133-235.telkomadsl.co.za].
198.54.202.194 [ctb-cache4-vif1.saix.net.] requested 1 pages as "Java/1.6.0_02"
196.25.255.210 [rba-cache2-vif0.saix.net.] requested 3 pages as "Java/1.6.0_02"
198.54.202.195 [ctb-cache5-vif1.saix.net.] requested 3 pages as "Java/1.6.0_02"
196.25.255.218 [rrba-ip-pcache-6-vif0.saix.net.] requested 4 pages as "Java/1.6.0_02"
198.54.202.214 [rrba-ip-pcache-5-vif1.saix.net.] requested 4 pages as "Java/1.6.0_02"
196.25.255.195 [ctb-cache5-vif0.saix.net.] requested 1 pages as "Java/1.6.0_02"
198.54.202.210 [rba-cache2-vif1.saix.net.] requested 2 pages as "Java/1.6.0_02"
198.54.202.218 [rrba-ip-pcache-6-vif1.saix.net.] requested 2 pages as "Java/1.6.0_02"
196.25.255.214 [rrba-ip-pcache-5-vif0.saix.net.] requested 1 pages as "Java/1.6.0_02"
198.54.202.234 [rba-cache1-vif0.saix.net.] requested 3 pages as "Java/1.6.0_02"
196.25.255.194 [ctb-cache4-vif0.saix.net.] requested 1 pages as "Java/1.6.0_02"
196.25.255.250 [ctb-cache8-vif0.saix.net.] requested 1 pages as "Java/1.6.0_02"
Assuming these script kiddies fix the default UA all that needs to be done to stop them is track access based on the proxy forward IP, which I do, which makes stopping this kind of nonsense childs play.
FYI, before anyone asks stupid questions like "How do you know it was a scraper?" it's because of the access of my pages names in sequential alphabetical order. Other than being distributed among many IPs via the SAIX caching proxy, which could be hard to identify via a log file review, the rest looked like it was amateur hour at the scraping faire.
This is why I tell people post-mortem Apache log file reviews simply don't work because there is insufficient information to identify things that my code easily catches in real time.
Posted by
IncrediBILL
at
7/31/2007 06:01:00 PM
10
comments
Saturday, July 28, 2007
Keniki Has Meltdown on Matt's Blog
The comments on most blogs aren't that amusing in and of themselves until one of the blog posters goes right off the deep end and has a meltdown.
The recipient of this meltdown and flamefest is no less than good old Matt Cutts himself.
First Matt posts that he's booted someone from his blog and Keniki chimed in about dreaming of being the recipient of such action:
keniki Said,OK, how does one "quit the net"?
July 20, 2007 @ 4:23 pm
I tuned in thinking it was probably me. To be honest I’d welcome it. Its not been easy seeing one of my sites ripped apart by proxy servers , scraped bowled and hijacked and it sent me into to a over the edge at times.
Matt I think you should apply the same filter to Keniki. I am probably going to quit the net anyway and you should delete my stuff, I was pretty pissed when I wrote most of it.
Yank the cables off the back of the computer?
Smash the wireless Centrino chip in the laptop?
Then a couple of days later Keniki goes full tilt:
keniki Said,Immediately followed by:
July 27, 2007 @ 9:31 pm
[...] FUCK that google the site also showed hidden content and deceptive redirects. It seems rules do not apply if you show google adsense, the passport of spam.
keniki Said,Damn!
July 27, 2007 @ 9:44 pm
Its all bullshit isn’t it google, you couldn’t give a stuff about quality results its all about the money now isn’t it. Your spam team are told not to touch results that carry google ads.
Someone woke up with their knickers in a knot didn't they!
Looks like a self-fulfilling prophecy in action about having Matt delete your posts.
I must say I'm shocked that people would be so rude and vent at a company employee that actually tries to help people on his own time.
This is a prime example why most company employees don't publicly admit, not on a blog anyway, who they work for as it's just too dangerous to paint such a bullseye on your back for anyone and everyone to come and attack you just for being a small cog in a giant wheel.
Guess we'll just have to wait and see how this little melodrama plays out.
Anyone giving Vegas odds on whether Matt boots Keniki?
Posted by
IncrediBILL
at
7/28/2007 04:36:00 PM
14
comments
Wednesday, July 25, 2007
1-More Scraper Tool
These scrapers are like locusts and here's another $19 pile of crap called 1-More Scanner that bounced off one of my sites today.
The user agent was "1-More Scanner v1.25" and it claims it can "Download images, MP3 or any file from any site!" which is an awfully big claim for something that didn't get a single page.
The only amusing part is a feature for "Proxy-support" which will just help me update my proxy list when I see it attempt to crawl via a bunch of proxy IPs, thanks for the help!
Posted by
IncrediBILL
at
7/25/2007 05:47:00 PM
3
comments
Labels: Scrapers
Tuesday, July 24, 2007
Site Scraping for DreamWeaver
Now there's a DreamWeaver plug-in that makes scraping easy for dummies.
If you have no web skills just use Site Import and rip off an entire site at once.
Why learn how to design a site, create your own content, or any of that nonsense when you can just quickly and rapidly download someone's site instead?
This is cute:
That's a nice theory until a bot blocker shuts your import down in mid-scrape.No limit retrieval
With Site Import 2.0 you can import as many pages from a site as you'd like – no more limits!
And my personal fave:
I'm not sure that stealing is a time-honored tradition even if imitation is the sincerest form of flattery.Learn from the pros
Learning by example is a time-honored tradition on the Web
And last but not least:
Grab your ankles and bend over while it extracts hundreds of thousands of pages from your database-driven site and pushes you over your monthly bandwidth allotment.Dynamic and database-driven sites, too!
Site Import works its magic with all kinds of Web sites – including those developed with ASP, ColdFusion, PHP or even .NET.
Don't know what user agent they use for this process but I'm pretty sure my sites (not this blog) are pretty safe from this shit except for the first few pages scraped while determining it's not a human at the controls.
Posted by
IncrediBILL
at
7/24/2007 03:40:00 PM
6
comments
Labels: Scrapers
FuckedCompany Died in June
About a year ago I reported that FuckedCompany was fucked, but it suddenly seemed to have a little more gas left in it and they started posting regularly again. However, it looks like that gas ran out as they quit updating the site on 6/8/2007 so it's probably dead for good this time.
FuckedCompany's site owner Pud, of AdBrite fame, is still posting on his blog but it appears he's given up on FuckedCompany, so I guess I'll give up on it as well.
Guess it's time to delete that bookmark.
See ya!
Posted by
IncrediBILL
at
7/24/2007 01:20:00 PM
1 comments
Sunday, July 15, 2007
Rehabilitating Massive Amounts of 404 Errors
One of my sites used to get as many as 100K 404 errors in a single month.
Leading cause of this problem?
SEARCH ENGINES!
That's correct, the #1 leading cause was search engines but they were just a symptom of a bigger problem and not the root cause. Sloppy scrapers and crappy wannabe search engines and directories that mucked up the URLs were the true culprit. Then the major search engines crawled these sloppy sites, indexed those mucked up URLs, and that's when all the 404 fun starts.
Obviously my bot blocking stopped the scraping so the source of the mucked up URLs eventually faded away but that still left a serious amount of junk in the search engine crawler queues to clean up.
Some of the links had everything from an ellipsis in the middle to fragments of a javascript OnClick() appended to the link. My personal favorites were the Windows script kiddies that don't realize Linux servers are case sensitive and converted all my links to lower case. There were lots of other errors but you kind of get the point of what kind of damage can be inflicted with homemade crawlers written by incompetent assholes.
There were obvious solutions to use to clean up the search engines but those didn't address the immediate issue of visitors hitting 404 errors. Since I didn't want any actual visitors hitting these mucked up links to get a 404 error page, I set about logging and redirecting all the 404 errors that could be recovered to the actual intended page. Many of the mucked up links contained enough of the original path that I could identify the original page and put the request back where it belonged. Over a period of time the corrections began to stick in the search engines and eventually the 404 responses dwindled to a much smaller and manageable number.
Just another reason to be a diligent in blocking unwanted crawlers and scrapers as nothing good ever came from letting them crawl.
Posted by
IncrediBILL
at
7/15/2007 03:58:00 PM
0
comments
Wednesday, July 11, 2007
Are Domain Parks Playing Unfairly in Google?
John Andrews has been writing about the domainers becoming publishers:
The next wave of the competitive internet has arrrived, and it’s driven by the Domainers. No, not parked pages, and no, not typo squatters. Domainers as publishers.After reading the post I was thinking "So what? They'll still have to fight for SE traffic just like everyone else except the added advantage of the premium domain names which will get type-in traffic and maybe rank a little better."
Well, I was sorely mistaken that it would still be even close to a level playing field as the domainers are using their domain park network to generate many thousands of backlinks in Google and Yahoo.
My initial investigation of all these backlinks in Google and Yahoo showed different links in the live sites I visited vs. Google or Yahoo cache which means they might be cloaking. The page cache always had specific links to their publisher sites on parked pages when the search engines crawled, but it'll be hard to prove it wasn't coincidence unless this situation persists over time.
The real question is why do the search engines index domain park sites in the first place?
The lame answer you'll get is "in case they turn into an actual website".
OK, crawl the sites, fine, but why should those parked pages show up in the search results or be allowed to influence page rank before they become an actual site of value?
We all know the an$wer to that que$tion a$ well.
Posted by
IncrediBILL
at
7/11/2007 12:52:00 PM
2
comments
Proxy Hijacking Humor
Instead of all the serious posts about Google Proxy Hijacking it's time for a little bit of humor, very little, my apologies in advance.
Riddle:
Q: What do you call thousands of PhD's that can't stop simple proxy hijacking of your website?Knock Knock Joke:
A: Google!
a: KNOCK KNOCK!Brain Teaser:
b: Who's there?
a: Proxy!
b: Proxy who?
a: Proxy who Google crawls through to hijack your site!
What does the following URL represent in Google SERPs?
http://someproxysite.com/nph-page.pl/000000A/http/www.airplane.com
Answer: If you said "Airplane Hijacking" you are correct!
And now, a sad light bulb joke:
Q: How many proxy sites does it take to screw in a light bulb?More airplane humor:
A: None. Proxy sites get Google to hijack a light bulb that's already screwed in.
Q: What's the difference between a website and a 747?Last but not least...
A: Proxy sites can't get Google to hijack a 747!
Q: What do you call a good proxy site?Ok, you can groan, boo and hiss now.
A: Offline.
Posted by
IncrediBILL
at
7/11/2007 12:10:00 PM
1 comments
Labels: Proxy Hijacking
Sunday, July 08, 2007
Dynamic Robots.txt is NOT Cloaking!
If I read just one more post that claims using dynamic robots.txt files is a form of CLOAKING it might be enough to drive me so far over the edge that it would make "going postal" look pale by comparison.
For the last time, I'm going to explain why it's NOT CLOAKING to the mental midgets that keep clinging to this belief so they will stop this idiotic chant once and for all.
Cloaking is a deceptive practice used to trick visitors into clicking on links in the search engine and then showing the visitor something else altogether, a bait and switch practice. Technically speaking, cloaking is a process where you to show specific page content to a search engine that crawls and indexes your site and show different content to people that visit your site via those search results from that search engine.
Robots.txt files are never indexed in a search engine, therefore they will never appear in the search results for that search engine, therefore a human will never see robots.txt in the search engine, click on it, and see a different result on your website.
See? NO FUCKING CLOAKING INVOLVED!
Since the robots.txt file is only for robots, and humans shouldn't be looking at your robots.txt file in the first place, then showing the human "Disallow: \" is perfectly valid although you may show an actual robot other things as the human isn't allowed to crawl.
Let's face it, some of the stuff in our robots.txt file might be information we don't want people looking at or hacking around as it's just that: PRIVATE.
Additionally, robots.txt tells all of the other scrapers and various bad bots what user agents are allowed so if you're allowing some less than secure bot to crawl your site, the scrapers can adapt to that user agent to gain unfettered crawl access.
Dynamic robots.txt is ultimately about security, it's not about cloaking, and nosy people or unauthorized bots that look at robots.txt are sometimes instantly flagged as denied and blocked from further site access so keep your nose out and you won't have any problems.
If you still think it's cloaking, consider becoming a temple priest for the goddess Hathor as a career in logical endeavors will probably be too elusive.
Posted by
IncrediBILL
at
7/08/2007 10:52:00 PM
71
comments
Saturday, July 07, 2007
Too Much FyberSpider In My Site's Diet
Found this FyberSpider thing that used to crawl from a Comcast address and has apparently grown up and is crawling from a real dedicated server now.
The ip was 69.36.5.45 and the reverse DNS claims to be server.fybersearch.net and sure enough there something called FyberSearch with what appears to be a functional search page. The results actually appear to be populated with data collected from their crawl, trade secret, don't ask.
69.36.5.45 "GET /robots.txt HTTP/1.0" "Python-urllib/1.15"Here's the data center info if you want to block it:
69.36.5.45 "GET / HTTP/1.0" "FyberSpider"
OrgName: JTL Networks Inc.The search page has issues finding words in the one page I allowed to be indexed so I'm not terribly impressed, NEXT!
NetRange: 69.36.0.0 - 69.36.15.255
Posted by
IncrediBILL
at
7/07/2007 01:43:00 PM
3
comments
Thursday, July 05, 2007
Al Gore's Son Arrested in Harrowing Hybrid Hijinx
I've never let anyone else post a guest article here before but this is just so true and so funny it needed to be shared with my readers.
Enjoy.
Guest post by Larry.
so al gore's son got arrested. again. the story has one detail that is so unbelievable that they should probably throw the entire case out.
is it unbelievable that al gore's son was arrested?
no
is it unbelievable that al gore's son was arrested again? for the second or third time?
no
is it unbelievable that al gore's son was arrested for the third time on penny ante drug charges?
no
is it unbelievable that al 3 was smoking marijuana in his car in the middle of the night?
no
is it unbelievable that he had some prescription drugs in the car with him?
no
is it unbelievable that some drugs includes quantities of xanax, valium, vicodin, adderall and soma?
no
is it unbelievable that of the prescriptions for some xanax, valium, vicodin, adderall and soma, none were in his name?
no
is it unbelievable that he was driving at 2 a.m.?
no
is it unbelievable that he was driving his prius at 100 miles per hour?
damn right it is.
100 mph in a prius? maybe if he drove it off a cliff and it was in free fall or scotty was beaming it up. down the road with tires on the pavement, i'd have to see it to believe it. clearly the whole case lacks probable cause for the traffic stop. it's a set up. bush making sure al doesn't get in the race. cause you know in this country you can't be president if your son is a jackass. wait so how did 41 get in? case dismissed, bogus traffic stop. they should have said, failed to signal a lane change like they usually do when they want to do illegal stops.
Posted by
IncrediBILL
at
7/05/2007 11:18:00 PM
0
comments
Tuesday, July 03, 2007
Google Proxy Hijacking - Myths, Urban Legends and Raw Truths
If you aren't a regular Webmaster World reader then you probably missed the most recent incarnation on the Google Proxy Hijacking situation where I had to step in and correct a lot of misinformation about Proxy Hijacking.
Go read the following:
Proxy Server URLs Can Hijack Your Google Ranking
Lots of good information there once you weed through all the misconceptions.
If you read that entire thread and still have any questions, feel free to ask!
Posted by
IncrediBILL
at
7/03/2007 10:42:00 PM
14
comments
Labels: Proxy Hijacking
Thursday, June 28, 2007
Dear Amazon AWS Group Part Deux
Back in November I wrote an open letter to the Amazon AWS Group about trying to get them to stop using the default user agent "Java/1.5.0_09".
Today I noticed that they gave me a clear response to my open request:
216.182.228.223 [domU-12-31-33-00-02-01.usma1.compute.amazonaws.com.]Oh yes, prefixing "Java/1.5.0_09" with an MSIE 6.0 user agent is MUCH better.... NOT!
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461) Java/1.5.0_09"
Must've been getting blocked from crawling too many sites that block the default Java UA.
Nice try guys, but that's really fucking lame.
Posted by
IncrediBILL
at
6/28/2007 01:33:00 PM
1 comments
Tuesday, June 26, 2007
Easy To Spot AlphaServer Botnet
Sometimes when a distributed botnet hits your site it's quite trivial to spot their collective effort because they're using a slightly offbeat user agent that's not terribly common in the first place combined with the associated speed and time of access.
Here's the IPs and user agent used:
76.190.183.150 [cpe-76-190-183-150.neo.res.rr.com.]That little group of IPs all hit within 2 minutes of each other and came from both hosting centers and residential locations, definitely a collaborative effort, most likely a botnet.
"Mozilla/4.0 (compatible; MSIE 4.01; Digital AlphaServer 1000A 4/233; Windows NT; Powered By 64-Bit Alpha Processor)"
71.205.86.12 [c-71-205-86-12.hsd1.mi.comcast.net.]
"Mozilla/4.0 (compatible; MSIE 4.01; Digital AlphaServer 1000A 4/233; Windows NT; Powered By 64-Bit Alpha Processor)"
67.160.41.82 [c-67-160-41-82.hsd1.wa.comcast.net.]
"Mozilla/4.0 (compatible; MSIE 4.01; Digital AlphaServer 1000A 4/233; Windows NT; Powered By 64-Bit Alpha Processor)"
70.224.38.36 [adsl-70-224-38-36.dsl.sbndin.ameritech.net.]
"Mozilla/4.0 (compatible; MSIE 4.01; Digital AlphaServer 1000A 4/233; Windows NT; Powered By 64-Bit Alpha Processor)"
75.84.251.65 [cpe-75-84-251-65.socal.res.rr.com.]
"Mozilla/4.0 (compatible; MSIE 4.01; Digital AlphaServer 1000A 4/233; Windows NT; Powered By 64-Bit Alpha Processor)"
72.232.65.34 [72.232.65.34.svservers.com.]
"Mozilla/4.0 (compatible; MSIE 4.01; Digital AlphaServer 1000A 4/233; Windows NT; Powered By 64-Bit Alpha Processor)"
I've seen more little attacks/scrapes like this than you can imagine but this particular user agent struck me a amusing as it's almost a desperate cry to get caught, like they're flaunting it in our faces that many of our machines are hacked.
Posted by
IncrediBILL
at
6/26/2007 10:31:00 AM
1 comments
Labels: Bad User Agents, Bot Nets
Thursday, June 21, 2007
Javascript Cloaked Spam Pages Baffle Search Engines
Recently I ran across a large series of scraper sites that are the ultimate in openly cloaking to the search engines. The pages I see when I view the source are the same pages cached by the search engines, nothing special there so a search engine crawling outside it's IP range to check for cloaking would see the same page.
However, access those pages with javascript enabled and you are instantly redirected to a wide variety of affiliate pages. The trick is these pages all have a single embedded link to a heavily obfuscated page of javascript that redirects you to the affiliate pages.
The scraping to build these cloaked pages came from 216.75.15.26 which is in the cari.net IP range:
OrgName: California Regional Intranet, Inc.Just goes to show you that traditional cloaking is a thing of the past as the war has escalated into obfuscated javascript. The only way I see the search engines winning this war is to actually execute that javascript and see if the resulting action was to take the visitor away from the page.
NetRange: 216.75.0.0 - 216.75.63.255
Just goes to show that people claiming here in comments recently that "Stealth crawling is necessary to keep honest webmasters honest" are out of their league and don't really know what the score is on the web as the sites aren't honest when they are in plain site, no stealth needed, they worked around it.
Wonder what they'll think up next?
Posted by
IncrediBILL
at
6/21/2007 11:50:00 AM
6
comments
Labels: Damn Spam
Saturday, June 16, 2007
Blog Feed Messed Up
I just noticed that the blogger feed is all messed up and my reorganizing old posts into categories and such appears to also dump them into the feed as something new.
Stupid blogger.
Sorry for the problem, but there doesn't appear to be much I can do about this.
Be prepared for a bumpy ride of summer reruns as I organize the blog!
Posted by
IncrediBILL
at
6/16/2007 02:44:00 PM
2
comments
Contact Us Form Spammers
Well boys and girls, you didn't really think that hiding your email address behind a CONTACT US form would stop spammers did you?
I have all of my forms on my website protected except one page which I left wide open with no protection just to allow anyone having trouble with the site easily contact me. That page has just a simple form, no captcha, no referrer checks, no bot blocking, nothing, it's completely open as a safety valve for access from end users.
However, some dick head in Oman with nothing better to do has apparently decided to make it his personal goal in life to automatically post to this form.
You have to ask yourself, why is this random form page so important?
The answer is obvious as everyone hides behind CONTACT US forms and no longer post email addresses which the spammers can no longer harvest from your web page. Now it would appear they are harvesting any page with a FORM on it and trying to set up the parameters that allow them to submit spam through all these forms.
I don't run any off-the-shelf Open Source software so there is no software fingerprint on any of my pages that the mass spammers could easily find, so this is an act of desperation in manually building a bigger database of sites to spam.
Just to prove this theory, I checked to see what else this spammer was trying to do on my site besides trying to spam my contact page. Big shock, the same IP address is trying to spam the other protected pages.
Here's some other info collected from the same IP:
62.231.243.137 "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115 Galeon/1.3.12" "massive dick sex" http://bratuha.infoI never see any of the above junk in my Inbox or anywhere else as it's all submitted on protected pages so a little information is automatically logged and the rest of the crap discarded.
62.231.243.137 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" "Online tramadol. Cheap tramadol." http://
So how can I protect this form from automation and still leave it open to not impact other visitors?
We'll use one of my old favorites, a simplistic but effective approach, which is RANDOM FIELD NAMES. Each time the form is displayed the field names change so the spammer can't pre-program any code to automatically populate the fields because he won't know their name.
An argument could be made that the spammer could read the page and use the field position, but that would assume the position in the HTML is the same as the position on the page, good old CSS to the rescue.
If I want to really make it just about impossible for the spammer to figure out the page and still not use javascript or a captcha, I might use 10-20 random fields with only 3 of them chosen at random to be visible so the user would never know the difference.
Golly gee Mr. Spammer, which of those 20 random fields should you fill in?
Be careful because filling the wrong field, the field the visitor can't see, is yet another form of CAPTCHA, so choose your field wisely otherwise you're automatically going to be banned.
Maybe to be real sneaky, I'll just add new fields to the form and leave the old obsolete fields on the page so if they get filled in I know it's an old spammer script.
Just remember, keeping your email address off the web site doesn't mean you won't get spammed so secure those contact pages today!
Posted by
IncrediBILL
at
6/16/2007 12:10:00 PM
11
comments
Labels: Damn Spam
Friday, June 15, 2007
Doctor Zero Goes Scraping
Some scraper used all zeros in place of the parameters normally found in an MSIE or Firefox browser user agent.
Just look at this stupid crap:
86.21.47.45 "Mozilla/5.0 (000000000; 0; 000 000 00 0 000000; 00000; 0000000000) 00000000000000 000000000000000"You know what he got for his efforts?
86.21.47.45 "Mozilla/5.0 (000000000; 0; 000 000 00 0; 00) 000000000000000 0000000 0000 000000 000000000000"
A big fat fucking ZERO in return, nada, zip, zilch, goose egg.
I'll bet he got the same number as a grade on his computer science project in school too!
Posted by
IncrediBILL
at
6/15/2007 06:17:00 PM
14
comments
Labels: Scrapers