March 22, 2021

HPR3296: Spam Bot Honey Pot

In this episode of Hacker Public Radio, I will describe the method I chose to combat spam bots filling out my company's contact form. About 99% of the submissions we receive are spam, which makes filtering for valid messages painful. After some research into different methods, I decided to go with the honey pot method.

The honey pot method uses an extra text input field to lure the spam bot into filling it out. There are different suggestions for how to hide this extra field from valid users by using either javascript or CSS. With javascript, the honey pot section of the form is removed from the DOM when the page loads, hiding it from your users. The argument for this method is most bots don't implement javascript, so the honey pot field will not be hidden from them. I think that is a valid argument but I didn't want to include extra javascript in my page--so I went with the CSS method.

There are references at the end of the show notes to a couple of the articles I read on implementing the honey pot with either javascript or CSS. My take away was, one, don't use the CSS display property set to the value of none to take the input out of the DOM. Sufficiently smart enough bots may know to scan for this, especially if applied directly to the element. Also don't name your classes something obvious to your intent like "anti-spam-filter". My guess is the majority of the bots out there aren't that sophisticated, but I figured it couldn't hurt to follow those suggestions.

I was already using Bootstrap CSS for our site, so I decided to use Bootstrap's "sr-only" class. This class is used for elements that you only want visible to screen readers. It takes the element and uses a combination of absolute positioning, setting the size and width to 1 pixel, setting a negative left margin, and hiding content overflow to prevent the honey pot showing up visually. I figured if the bot was scanning CSS for classes or properties, this wouldn't trigger any warnings. It does bring up the issue of how to prevent impacting the experience of people using screen readers. I applied the aria-hidden attribute with a value of true to the label element surrounding the honey pot input field. "[this] removes that element and all of its children from the accessibility tree." So we now have the field hidden both visually in the browser and from assistive technologies. Given the short end of the stick accessibility usually gets, I doubt there are any spam bots scanning for that ARIA attribute. For the minority of users who might be viewing with the classic lynx browser, I put 'For office use' as the label text before the honey pot, hoping this would get the message across without tipping off the bot to the intended purpose of the related input field.

The other main issue with this method is the value of the name attribute used for the input field. Some argue to use obfuscated values like "mmxxName" instead of "name", or "sxysPhone" for "phone". Apparently some bots will skip fields they don't recognize. By using more standard names for multiple honey pot fields, it easier to determine if it is a bot. The counter argument to this naming scheme is about the user experience, by obfuscating the name, then browsers won't auto-fill the valid fields of the form. This also brings up the matter of not auto-filling the spam fields by the browser of your users. This is done by setting any of your honey pot input elements' "autocomplete" attributes to "off".

So far this spam filtering method is working nicely. I currently send any messages flagged as spam to a different email address with the subject prepended with the words "[Spam review]". Once I am confident there are not that many false positives, I will just skip sending flagged messages. The one issue I have experienced with this method is when using the tab key to move through the form. Since the input field is only visually hidden, it still receives focus as you tab through.

...more