Excellent article, on a clever way to kill spammers (except for the human ones, of course...) here.
Comments: When doing form submissions, I have four rules.
1. Do not use javascript for error-checking. Scripts do not run javascript. Error checking should be done on submit, which then kicks the user back with a message (or displays a javascript popup before the forced back.) If you do use javascript, you must then disallow access to the form if javascript cannot run. You can do this by generating the form from javascript. This is a pain in the arse.
2. Robots.txt must specifically deny access to submit.php or any of your script files. If you follow 1, this should not be a problem, since the error checking will catch the intruding script in action. You may also deny access to the form, but the form might be on a main page, so this is not always the best idea.
3. Reject blank submissions always.
4. If you need a secure form, the page which contains the form must be secure and not just the submission script.
There are basically two kinds of script bots - you have crawlers, which will probably obey robots.txt - and are just indexing. Number 3 is just in case the script doesn't obey robots.txt (or you forget 2.) But again, 1 may cover this whole deal fairly well. 2 is just to make sure that you don't get things like google indexing your script files, which are useless to someone searching for your site. The other type are spambots, which are trying to insert hyperlinks somewhere - whether it be in your email inbox, comments sections, etc. Spambots will usually try to fill form fields with useless data just to get the form submitted - without knowledge of 'required' fields the best method is just to fill 'em up.
Robots txt, like I've mentioned about twice before in this post, as well as rejecting blank submissions to a submission script, should keep crawlers out of your hair and indexing what you want them to. The usual method of dealing with spambots is captcha, which requires a database.
Basically, captcha generates a key (hash) and phrase (often random letters or numbers.) The captcha script stores this pair in a database of some kind - secure most likely (hopefully!) with expiry information so the captcha can't be brute-forced by a foreign form. The script then places the key in an invisible form field, and converts the phrase into an image with some kind of minor noise - the actual text is not in the form. The human user, able to parse letters and numbers from a fairly noisy environs can probably figure out what it says, and types that in. The script merely sees an image file (which it may not even have a method for parsing) and a form field; it is thus stuck unless it is a simple image and the bot a sophisticated script.
This is a positive-space security, meaning 'who you know, what you are, what you have, what you know' is tested affirmatively. We want to know you're a human, so we make you do something that a human can do but a script cannot. (this is a What you are.)
His method is a negative-space security. Instead of figuring out if you're a human, he figures out if you're a script. If you show you're a script by doing something that a script WILL do, you're denied access. Or, you are permitted access as long as you don't do something a human won't do. Kind of Bladerunner-esque. At least they don't ask you what you would do if you were at a party and they were serving dog...
Error: You are insufficiently human to post on this website, synthetic!
Much like how raccoons are caught by putting a coin in a hole with a nail protruding in it. Our furry brethren are quite too determined to get the coin to let go.
Ok, you let go of the coin to get your hand out. We know you're not a raccoon. You may post on this website!
Darn security features...
The other benefit to this method is that it doesn't require a database implementation - just an extra test on your submit script.
(note, my distinction does not change that we're testing for who you are, but the test itself is being done in such a fashion that a positive response to the test (filling in the field) will result in a negative response by the gateway.)
Later.
Post new comment