My Blog, Blogging
I hate comment spam, but I found this in my moderation queue this morning… I marked it as spam and junked it, but I thought I’d preserve it here for everyone to see…
hello , my name is Richard and I know you get a lot of spammy comments ,
I can help you with this problem . I know a lot of spammers and I will ask them not to post on your site. It will reduce the volume of spam by 30-50% .In return Id like to ask you to put a link to my site on the index page of your site. The link will be small and your visitors will hardly notice it , its just done for higher rankings in search engines. Contact me icq 454528835 or write me tedirectory(at)yahoo.com , i will give you my site url and you will give me yours if you are interested. thank you
So, he can reduce my comment spam by asking the spammers nicely, and in return I get to spam my home page for him? Wow, seems like a good deal to me! Pity for him Akismet works so well and I never see the majority of my spam 
Since I swapped over to WordPress I’ve had far fewer problems with comment spam. It may be that the SpamBots haven’t yet figured out that I’m not running MT anymore, but I’ve got WP configured with a couple of options to help block the worst of the spammers (block comments posted via open proxies), and of course I’m still using my IPTables spam firewall.
Now as an extra line of defence I’ve added a captcha to the comments form. I used the AuthImage plugin for WordPress with a couple of treaks to the veriword.ini to make the image a little more readable. I’m still not 100% happy with it, it still generates unreadable images at times, but most of the imges are very readable so that will have to do for the moment.
Hopefully this should cut out the last of the automated spam.
I’ve now changed my blog over to Wordpress. I’d been using Wordpress on my wife’s site for a while now and liked it quite a bit. My MT install was getting dated, and very prone to spam. Instead of updating to MT 3, I decided to make the switch to wordpress instead.
A couple of things that impressed me about Wordpress was the Codex wich is a great source of assistance for all things Wordpress. The codex was a great help in the changeover to WP. Theres also a large number of alexking.org: WordPress Themes available. The site is currently running the “rin” theme. I’ll hack that in coming weeks to add AdSense ads again. Eventually I’ll roll my own theme for something more distinctive, but this’ll do for now.
Finally, I worked a bit of mod_rewrite magic so that virtually all of the old URL’s still work. If you read the site through the RSS feed, the old URL’s still work no need to resubscribe. If there’s any drama’s, leave a comment on this post and I’ll try to sort things out.
Although my blog has been devoid of fresh content for the last few months, one thing that hasn’t changed has been comment spam. I’ve spent a considerable amount of time keeping the blog as spam free as possible, and stuck a few problems along the way.
The bigest one has been the size of the blacklist used by mt-blacklist. It seems that the version of mt-blacklist I’m using (1.6.5) has issues when the blacklist reaches a certain size. At around 4020 entries, mt-blacklist will no longer write to the blacklist, making adding new URL’s and despamming very difficult.
Another issue I found that via some simple log analysis, comment and trackback spam were a very significant portion of my website traffic (40-50%). So I decided it was time to take a more drastic approach. Here’s what I did:
Extract the IP addresses of all comment and trackback submissions from my Apache logs:
cat /var/log/httpd/jim-access_log* | grep mt-comments.cgi | grep POST | cut -f1 -d" " > comment_ips.txt
cat /var/log/httpd/jim-access_log* | grep mt-tb.cgi | grep POST | cut -f1 -d" " >> comment_ips.txt
Using a small perl script (code in the extended entry), I extracted the 20 worst offenders:
cat comment_ips.txt | ./comment_ip_sum.pl | head -n 20
Finally, I created a custom chain in iptables on my web server, and added rules to DROP all packets from the worst of the spammers. Interesting to note that each of the Top 5 IPs were posting about at least 10x the amount of spam as the remaining IPs on the list.
iptables -N web_filter
iptables -A web_filter -S -j DROP
iptables -A web_filter -j ACCEPT
iptables -I INPUT 1 -p tcp –dport 80 -j web_filter
I could probably automate the creation and maintenance of firewall rules a little more, but that’s good enough for me.
The perl script I use and the current block list are in the extended entry to this post.
Continue reading ‘Fighting Comment Spam with Firewalls’
By and large my blog is pretty resistat to comment spam. I ran MT 2.51 and MT-Blacklist 1.62. I get a few comment spam slip through the blacklist each day, but the vast majority gets stopped by the blacklist.
Recently though, a new breed of spam started appearing that used a form of HTML character encoding to hide the URL (eg o instead of an o in the URL). This went clean through MT-Blacklist. Blacklisting the encoded version of the URL wouldn’t help because our friendly neighborhood spammers would just encode a different character in the same URL, and we’re back to square one.
This, I thought, shouldn’t be a difficult problem to fix, so I figured I’d grab the latest MT-Blacklist 1.65 and start hacking. Lo and behold, 1.65 already addresses thr problem!
At this point I figured I had a bit of time to spare, so what else can I do.to protect my site from spammers. It turns out that Six Apart have released a plugin that neatly implements the rel=”nofollow” attribute recently announced by Google, Yahoo and MSN.
Now, I don’t beleive nofollow will do an awful lot to reduce comment spam. Its a great idea in principle, by reducing the incentive for spammers to use comments (i.e. no boost to pagerank), hopefully we’ll see a reduction in comment spam. However the problem is that it requires the majority of bloggers to implement it on their own sites, and many just won’t bother. Having said that, at least its something, and the Movable Type ‘nofollow’ plugin made it so easy it was a no-brainer.
Six Apart also suggested that nofollow plugin users upgrade to 2.661, which I did at the same time. Its not easy to find a download link on 6A’s site anymore for 2.661, but this post from Jay Allen gave me the hint I was looking for. One day I’ll upgrade to MT 3, but that day isn’t just yet.
So now I have a nice, fresh, up to date MT install which has been hardened a little more against comment spam. Of course, there’s still more I could do. The MT-Blacklist Updater would probably be a good idea. But I suspect that the most effective solution would be a captcha test. I know there’s accessibility issues with captcha’s, but they do at least stop automated posting of comments, which is the main goal here. James Seng has written a MT plugin that generates captcha codes. I’d want to do a code review before instalaltion though to ensure that the relationship between the image generated (with a 6 digit code) and the hidden field (also a 6 digit code) is cryptographically secure.
In the last 48 hours my blog received something like 700 successful (i.e. not blocked by MT-Blacklist) spam attempts. THis is in addition to something like 1000 comments that were denied by MT-Blacklist.
One day I’d like to do an analysis of the amount of traffic comment spam is generating on my web site. However, with 1700 incidents in around 48 hours, I’d guess it would be a significant portion of the overall traffic (i.e. 10% or more).
However, the Learning MT Blog has an excellent article Concerning Spam, including a number of techniques for dealing with comment spam that I wasn’t aware of.
Spammers have discovered bloggers and sooner or later if you allow comments, trackback pings, or the Movable Type send-entry form on your weblog you will get spammed.
Feedster is one of those really cool tools that I probably don’t use as much as I perhaps should. Perhaps its because Google seems to do a good enough job of finding things for me, or that the sorts of things I’m usually looking for just aren’t in RSS. Feedster Jobs for example is a great idea, but they don’t list any jobs here in Adelaide. Anyway, I’m looking forward to reading more of the Feedster Hacks blog. I’ve already taken iits suggestion on adding a Feedster Search link to each weblog entry.
For Movable Type, it would be: a href=”http://feedster.net/links.php?url=<$MTEntryPermalink valid_html="1"$>“>Feedster Search Post
While doing that I also took the opportunity to search for everyone who links to jimohalloran.com and turn that into a RSS feed. So noe I guess I’ll be using Feedster a lot more than I used to, even if its just to see who’s talking about me.
Now that I have an aggregator thats under my control, I’ve been able to add a blogroll to my home page. This was achieved with a modified version of FoF’s opml.php and a MT Template module.
Heres how to do it in more detail:
You might want to set up a cron job to run blogroll.php on a regular basis to make sure your blogroll is kept up to date.
Your blogroll should look something like the one on my home page. The little orange XML buttons are done with CSS, and link to the RSS feed. The title links to the HTML version of the site.
Its a quick hack, but it works for me, I hope it helps someone else as well.
For a long time I’ve used NewsIsFree as my Aggregator. Its got a couple of cool features (e.g. some scraped feeds for some sites not supporting RSS). The way I’ve always used it was to have it send me an email every day with the news items, which I thin read in order to figure out what was new. It wasn’t perfect, far from it, it was often down, and articles dissapeared from its database fairly rapidly, but it was good enough. NewsIsFree always seemed to walk a tightrope between being a free service and a pay service, and slowly over time all of the useful features have dissapeared into the pay service. Late last week, without warning, the email notification serice went this way as well.
I’d been meaning to hack together my own RSS reader anyway, so the recent changes with NewsIsFree prompted me to make a change. What I’using now is feed on feeds. FoF is dead simple to use and built in PHP, making it very hackable. As the site says…
It keeps track of what items you’ve read, and keeps happily checking up on your feeds no matter where you are. Whenever you want to see what’s new, you just bring up a web page and scan the newest items. You can mark the items as read so they won’t be shown again. Or, you can just always show the most recent N items, like the way LiveJournal’s friends pages work.
About all a news reader really needs to be honest. I wanted a server based solution, I’m trying to give myself maximum flexibility by using web based apps wherever possible. Theres a few little changes I want to make, but by and large FoF does everythign I need, sparing me the effort of writing something from scratch.
Over the course of an hour from just before 11pm last night my blog was hit with the worst bout of comment spam I’ve seen yet. Every single post I’ve ever made was spammed with a comment. Comment spam has been on the rise lately, but this is rediculous.
If you don’t have anything useful to say on my site, don’t say it at all. I’m not interested in whatever drug you’re pedalling this week. Comment spam is vandalism, nothing more.