There’s really no debate that despite all efforts to combat it, spam email continues to grow and thrive on the internet. Since I host my own email server (providing accounts for myself, my family, and a few friends), I’ve watched as gargantuan volumes of unsolicited email stream in over the wire, and I’ve had to keep up to speed on the latest and greatest spamfighting techniques in order to keep our mailboxes reasonably free of the nuisance. That being said, the whole system has always felt like a fragile beast, and when my spam system fails for even a few minutes, my inbox can get buried. (For example, a component of my filters got overloaded this morning for just over six minutes, and over 50 spam emails slipped through in that period.) So, for the past year, I’ve been hunting for ways to optimize my mail setup in order to lessen the load on the spam filters, and one specific way has eluded me until this morning. Being that I’ve actively searched for this very solution for over a year and not had success until today, I figured I’d describe what I did in case anyone else is looking for the same fix.

(Really, I shouldn’t have to tell you this, but what follows is an extremely detailed, low-level description of my mail setup and the innards of a spam filtering system. It’s dorky, and you probably won’t want to read the rest unless you’ve imbibed a good deal of caffeine and know your way around sendmail.)

I figure I should start out with a description of my mail server setup, which I’d imagine is pretty similar to quite a few others out there. My mail server is a Linux box, and I’m running sendmail 8.13 and Cyrus IMAP Server as my mail transport and mail delivery agents, respectively. I run SpamAssassin as my spam filter and ClamAV as my virus scanner; sendmail interacts with both of them through milters (spamass-milter for SpamAssassin, and clamav-milter for ClamAV). I host email for about a dozen people across two dozen different domains, so I use sendmail’s virtual user functionality to map each valid email address to a mail server account.

All in all, there’s not a lot out-of-the-ordinary to this setup, but one bit of it has been both frustrating to me and the source of a tremendous load on my spam filters. Watching my log files, I saw the entire mail delivery process play out in the following sequence:

  1. a piece of email would stream in from the outside world and sendmail would accept it;
  2. sendmail would pass it off to the spam filter system using spamass-milter;
  3. the spam filter system would perform its checks and then send it back to sendmail with a spam score;
  4. sendmail would pass it off to the virus scanner using clamav-milter;
  5. the virus scanner would perform its checks and then send it back with a decision on whether any bugs were found;
  6. sendmail would pass it off to Cyrus;
  7. Cyrus would check to see who the recipient of the mail should be, and deliver it to that user’s mailbox.

    While this works perfectly for valid users, I started to see a problem when an incoming email was addressed to a nonexistent user. Instead of sendmail checking the validity of the intended recipient’s email address at the very first step, my setup would happily pass the email through the entire system, and only reject it when Cyrus couldn’t find a valid recipient mailbox. And while this isn’t a huge problem if we’re talking about the occasionally-misaddressed message, today’s spammers are fond of dictionary attacks on mail hosts, and if 100 different emails came in at the same moment but only one of them was addressed to a valid user, all 100 would still be passed through the spam filters and virus scanner. (It’s like a college professor grading 100 term papers and then figuring out afterwards that only one of them was actually for the course he’s teaching.) When the load got high enough that the filters started to time out, spam would slip through to me and my users. Despite searching high and low for a solution to this for a while now, it wasn’t until today that I figured out what was going wrong. And lo and behold, it turns out that the biggest problem was with my assumptions about how sendmail behaves, and that the solution was in changing those assumptions, and then changing my virtual user setup to match what actually goes on.

    When sendmail is set up with virtual user hosting, the virtusertable file holds all the information mapping email addresses to mail server accounts. (For example, there are about 40 or 50 different email addresses that all funnel into my one email inbox.) My assumption has always been that when sendmail receives a new email, it looks up the recipient email address in the virtusertable file to figure out who should get the email, and rejects it instead if the address doesn’t exist in the file. Thus, I couldn’t understand how nonexistent email addresses were getting through that first step in the processing chain, and all my searching was focused on this one point. After finally finding this discussion thread today, though, I learned that my assumption was wrong — sendmail uses the virtusertable as a supplement to other information about the validity of email addresses, and recipients who don’t show up in that file aren’t necessarily considered to be invalid. Better yet, it’s possible to declare wildcard addresses as specifically invalid in the virtusertable file, so sendmail can reject email immediately and avoid the load of scanning it for spam and viruses.

    Once I added a single line to the bottom of my virtusertable file:

    @queso.com     error:nouser 550 No such user

    the results were exactly what I wanted; I was able to watch my logs as hundreds of invalidly-addressed messages were instantly rejected rather than shunted into the spam and virus filters. (This isn’t a hyperbole — in the 73 minutes after I made the change to the queso.com domain, 1,450 messages were rejected, or one every three seconds.) I tested the setup a bit more and then made the same change for all of the other domains I host, and now everything’s running perfectly.

    So, if you host your own email server and have noticed an uptick in the load spam’s exerting on your system, you might want to take a look and see if you’re processing the incoming mail in the most efficient way. Now that my setup has been optimized, it’s only the occasional message that makes it all the way through to the spam filter stage; I’m literally seeing more than a thousand messages an hour that are rejected outright rather than forced through the filters. It’s pretty satisfying.

Comments

Well, that’s awfully handy. Should I ever be stuck running sendmail, I’ll keep that in mind.

The Qmail system on my rig needed patching so I could add a configuration file detailing which addresses, exactly, were allowed in. The SMTP session throws out anything not on that list, which has lowered the operating overhead on my server considerably. So, basically, we took different approaches to the same basic fix. Ah, I love Internet technology! *wry grin*

• Posted by: GreyDuck on Oct 19, 2006, 5:06 PM

The beauty of Unix — I run my own mail server and we don’t use any of the same packages. I use Postfix, Dovecot, and amavis.

• Posted by: Rafe on Oct 19, 2006, 10:56 PM
Please note that comments automatically close after 60 days; the comment spammers love to use the older, rarely-viewed pages to work their magic. If comments are closed and you want to let me know something, feel free to use the contact page!