Comment Spam Solutions
Well it finally happened to my blog - I've been getting hit with comment spam every day now for the past week. I thought I would be safe for a while longer, since my backwater blog is still pretty unknown in the greater blogosphere. But those spammers are crafty...
Many bloggers - including those who use .TEXT to run their blog like I do - have had this problem recently and have come up various ideas. I've been doing some research into solutions and they seem to fall into two general categories. I listed the ones I've found so far below for future reference as I try some of them out, and to help others that are looking for this information. Of course the ultimate solution is to simply turn comments off, but then that's no fun! Similarly some have tried disabling comments on all posts older than XX days which reduces the volume of comment spam but usually doesn't eliminate it.
- The first solution is to implement an "Are you a human?" check since the vast majority of comment spam comes from automated scripts. This is typically a CAPTCHA image that forces you into typing the letters, number, and/or symbols you see in the image in a textbox. This has seemed to be extremely effective, but it can be a minor pain for readers to do that and I'm a little worried it may lower the number of comments received. Not that I get many anyway, heh. =)
- Miguel Jimenez created the Clearscreen SharpHIP HIP-CAPTCHA Control which is (AFAIK) the easiest and widely implemented CAPTCHA solution for .TEXT blogs.
- Meandering-Blog.Com has a CAPTCHA solution that involves changing a couple aspx files. It is derived from Chris Kunicki's article on implementing a CAPTCHA image for .TEXT.
- There are also commercial CAPTCHA controls that you can buy as well. The first one I found was ByteSize FormGuard, although I'm sure there's others.
- The second solution is to implement some kind of filter to check the comment text after it is submitted. This can either be done in the .NET code on postback, or in SQL when/after the comment is saved to the database (via stored procedure, triggers, or scheduled job). Now obviously this solution is only as good as your filter algorithm, similar to spam filters in your email client. A couple implementations of this I've seen out in the wild are:
- Count the number of links (href's) in the comment. Most comment spam (and all the ones I've received) have numerous links embedded in them. netnerds.net has a great example of how to do this in the database.
- Look for keywords or phrases that most comment spam contains, and do not allow a comment if it contains those keywords/phrases. This would probably lead to the most false-positives in my opinion. John Sample has an example of this.
- A smarter way to do content analysis is to use Bayesian filtering, similar to what SpamBayes does. I saw a plugin for the MoveableType blogging system that used that, but haven't heard of any .TEXT solutions using this technique.
- Consult one or more domain blacklists, which are lists of known IP Addresses or domains that spammers have used in the past.
- Robert McLaws created the CommentSpam.org site for saving the IP Addresses of known comment spammers. It has web services available to check an IP address against its list, which currently consists only of comment spam received on his sites.
- The MT-Blacklist/Comment Spam Clearinghouse also has a very large comment spam blacklist. Although this is primarily targeted at Movable Type blogs, the blacklist is available for anyone to use.
