A blog about .NET, Graffiti, Community Server, and Kevin's life

Comment Spam Solutions

Well it finally happened to my blog - I've been getting hit with comment spam every day now for the past week. I thought I would be safe for a while longer, since my backwater blog is still pretty unknown in the greater blogosphere. But those spammers are crafty...

Many bloggers - including those who use .TEXT to run their blog like I do - have had this problem recently and have come up various ideas. I've been doing some research into solutions and they seem to fall into two general categories. I listed the ones I've found so far below for future reference as I try some of them out, and to help others that are looking for this information. Of course the ultimate solution is to simply turn comments off, but then that's no fun! Similarly some have tried disabling comments on all posts older than XX days which reduces the volume of comment spam but usually doesn't eliminate it.

  1. The first solution is to implement an "Are you a human?" check since the vast majority of comment spam comes from automated scripts. This is typically a CAPTCHA image that forces you into typing the letters, number, and/or symbols you see in the image in a textbox. This has seemed to be extremely effective, but it can be a minor pain for readers to do that and I'm a little worried it may lower the number of comments received. Not that I get many anyway, heh. =)

  2. The second solution is to implement some kind of filter to check the comment text after it is submitted. This can either be done in the .NET code on postback, or in SQL when/after the comment is saved to the database (via stored procedure, triggers, or scheduled job). Now obviously this solution is only as good as your filter algorithm, similar to spam filters in your email client. A couple implementations of this I've seen out in the wild are:
    • Count the number of links (href's) in the comment. Most comment spam (and all the ones I've received) have numerous links embedded in them. netnerds.net has a great example of how to do this in the database.
    • Look for keywords or phrases that most comment spam contains, and do not allow a comment if it contains those keywords/phrases. This would probably lead to the most false-positives in my opinion. John Sample has an example of this.
    • A smarter way to do content analysis is to use Bayesian filtering, similar to what SpamBayes does. I saw a plugin for the MoveableType blogging system that used that, but haven't heard of any .TEXT solutions using this technique.
    • Consult one or more domain blacklists, which are lists of known IP Addresses or domains that spammers have used in the past.

» Similar Posts

  1. Dave Burke Writes about Community Server
  2. Live Blogging the CSDC - Part 3
  3. Live Blogging the CSDC - Part 2

» Trackbacks & Pingbacks

    No trackbacks yet.

» Comments

  1. Kevin Harder avatar

    I found two other CAPTCHA implementations after writing this post. They are from Kevin Gearing [http://www.dotnetfreak.co.uk/blog/archive/2004/11/06/166.aspx] and Dave Burke [http://dbvt.com/blog/archive/2004/12/02/579.aspx]

    Kevin Harder — January 16, 2005 9:26 PM

Comments are closed