Review, Linux E-Mail,set up, maintain, and secure a small office e-mail server

(this is a review that i originally posted somewhere on Slashdot, IIRC)

Linux E-mail, Second Edition is a book written for Packt Publishing by I. Haycox, A. McDonald, M. Back, R. Hildebrandt, P.B.Koetter, D. Rusenko and C. Taylor. Linux E-mail contains Continue reading Review, Linux E-Mail,set up, maintain, and secure a small office e-mail server

How to extract email headers and store them in a cache file

note: I am trying to publish, a piece at a time a lots of tricks that I use in my email management system, in such a format that each of them is usable separately. This is why it may be a bit difficult to understand certain parts of this and other pages, until I have published all of them. In the meantime, please let me know about anything you find not clear in these pages, so I can improve them, and read this article of mine on how to Build your own email server with Postfix, because it is a good synthesis of the whole picture

All email messages have a set of hidden headers that can be used to filter email in a million different ways, for example to ignore uninteresting threads in mailing lists. Some kinds of filtering require that you first extract and write to a file some header from each email message. The rest of this page shows to way to do this, one for the Mutt email client and another that is much more general and can be even used with webmail, if you have the right environment. The examples below show the code I use to extract the headers of mailing list messages I want to ignore, but the technique can be easily generalized.

The Mutt way

Mutt is a powerful text-based email client that many people (including me) still use not because they like to suffer, but simply because Mutt is much more configurable than all other clients around. in Mutt you can independently define macros for the “index view”, that is the listing of all messages in the current mailbox, and for the “pager view”, which is the one displaying the content of the selected message.

Here are the two macros that you need in order to extract an header from whatever view you’re in and write it to a file:


  macro index X     "|formail -XMessage-ID: | cut -d: -f2  | cut -c2- >> $HOME/.MAIL/ignore.MUA.cachen"
  macro pager X     "|formail -XMessage-ID: | cut -d: -f2  | cut -c2- >> $HOME/.MAIL/ignore.MUA.cachen"


in this case, the selected header is Message-ID. We extract it with a chain of three commands. To understand what each of them does, let’s run the first alone, then the first two, then all of them:


  [marco@polaris]$ cat sample_email |  formail -XMessage-ID:
  Message-ID: <BA3D918879B942D48484D4F6B9D1DC6A@example.com>
  [marco@polaris]$ cat sample_email |  formail -XMessage-ID: |  cut -d: -f2
   <BA3D918879B942D48484D4F6B9D1DC6A@example.com>
  [marco@polaris]$ cat sample_email |  formail -XMessage-ID: |  cut -d: -f2 | cut -c2-
  <BA3D918879B942D48484D4F6B9D1DC6A@example.com>
  [marco@polaris Procmail_irrelevant_threads]$


As you can see, the formail program extracts from the message the whole line containing the requested header. The first invocation of the cut utility (`cut -d: -f2`) splits that line in two fields, using the colon as separator, and keeps the second one. The last command (`cut -c2-`) strips the first character of its input, becaus it takes all the characters of the received string, starting from the second one. Therefore, we’re left with the

The general way

If you use the maildir format for mailboxes, every mailbox is a directory and every message is a text file in one of its subdirectories. You can still use procmail to add the Message-ID of a message to a cache file in this way.

First of all, create a separate maildir folder (let’s call it .uninteresting/) reserved to uninteresting messages. Second, create a cron job that every few minutes scans all the files in the .uninteresting/ subdirectories and feeds each of them to procmail with a command like this:


  /usr/bin/procmail -m irrelevant_files_recipe.rc < ${CURRENT_FILE}


This is the content of irrelevant_files_recipe.rc:


  :0hc:ignore.cache$LOCKEXT
  | formail -D 524288 ignore.cache

  .archive_of_irrelevant_threads/



This is a recipe that just locks the cache file, adds to it the Message-ID header of the current email with formail and then writes a copy of the message to a dedicated folder (you could just tell procmail to add that copy to /dev/null, which is like not doing it, but I prefer to keep a copy anyway, just in case).

Once you have that maildir folder and the script using this procmail recipe set up, you’re set. Instead of telling your email client to run a macro on the uninteresting messages, simply move them to the .uninteresting/ mailbox. The cron job will find it and tell procmail to add its Message-ID header to the cache, so the other procmail recipe will block every replies before they enter your mailbox. As you can see, the good thing in this other approach is that it works whatever email client you use, even if you switch client, as long as you have access to maildir mailboxes on a Linux/Unix server.

How to ignore uninteresting threads in mailing lists

Even in this age of social networking and instant messaging, mailing lists are very useful tools to get technical support or carry on public discussions online. The problem with mailing list is that, especially when they are very popular, you will soon find out that most of the traffic is irrelevant, because it’s either some flame war or some topic that has no interest for you.

This kind of email traffic is a big problem for two reasons:

  1. it is much harder to filter out automatically than spam (unless you use the trick below, of course)
    • Spam is much easier to recognize and stop before it enters your mailbox because it has a much more homogeneous structure. Uninteresting discussions, instead, are not spam: they are simply uninteresting!
    • you cannot filter out single subscribers of the mailing list. If the same person is engaging in a flamewar and in an explanation that you want to follow (a very common event on some lists…) you want to see nothing of the flamewar and all of the explanation in your mailbox
    • you can’t even write filters that do cancel all messages with uninteresting words in the subject: how do you know all those words in advance?
  2. (this is a consequence of the 1st problem) Since spam is very easy to filter automatically, if you follow many mailing lists you will soon end up receiving much more uninteresting messages that spam ones: in other words, this class of email makes you waste much more time than spam. This is one of the main reasons, in my experience, why many people are somewhat scared of mailing lists

Luckily, there is no reason to endure hundreds of uninteresting messages if you have the right tools. Gmail has a “kill thread” function, but you can get it with any email client (including webmail) if you can use procmail to filter your incoming messages. All you need is to add the recipe below to your procmail filters.

Years ago, I realized that irrelevant mailing list threads were wasting much more of my time than spam. When I asked for help on the procmail mailing list, procmail guru Sean Straw (SBS) wrote this recipe (see synthetic explanation below):


       1
       2    #============================================================================
       3    # simple recipe to ignore threads based on prior cache of threads to ignore.
       4    # 20061230, SBS
       5
       6    # get In-Reply-To messageid, check to see if it is in the ignore cache or
       7    # in the mua_ignore cache.  formail stores cache with ascii-z terminations,
       8    # but grep will still match the binary file.
       9    # if we have a match in the MUA id file or current cache, ADD the messageid
      10    # of THIS message to the cache, so that replies to it will also be ignored.
      11
      12    # ensure these are blank, not set to something you might have used them for
      13    # previously
      14    REFS=
      15    REFSNL=
      16
      17    :0
      18    * In-Reply-To:.*/[^    ].*
      19    {
      20             # Assign the results to REFS
      21             REFS=${MATCH}
      22    }
      23
      24    :0
      25    * ^References:.*/[^    ].*
      26    {
      27             # Append the results to REFS
      28             # no consideration as to whether REFS was null or not.
      29             REFS="${REFS} ${MATCH}"
      30    }
      31
      32    # by doing this ONLY if REFS contains non-whitespace, we spare
      33    # ourselves the overhead of the pipe chain invocation when it isn't
      34    # needed (i.e. messages with no references).  Arguably, REFS shouldn't
      35    # be set at all if the headers are empty, but this check is cheap to perform
      36    :0
      37    * REFS ?? [^    ]
      38    {
      39             REFSNL=`echo "$REFS" | tr -s "  " "nn" | 
      40                     sed -e '/^([^<].*|.*[^>]|)$/ d'`
      41    }
      42
      43    :0hc:ignore.cache$LOCKEXT
      44    * REFSNL ?? .
      45    * ? grep -qF "$REFSNL" ignore*.cache
      46    | formail -D 524288 ignore.cache
      47
      48    # if the preceeding conditions matched, then file this message
      49    # away as irrelevant.
      50    :0A:
      51    .processing.irrelevant_threads/


here is how it works. Lines 18-22 put the value of the In-Reply-To header in the REFS variable. Lines 25-30 do the same with the References header. In-Reply-To and References are two headers containing the unique identifier (Message-Id) of the message to which the current email is a reply. Lines 37-41 clean and reformat, so to speak, the REFS variable so it’s easy to use in the following comparison.

Lines 43-51 finish the job: if the content of the REFS variable is present in a cache file (cfr “ignore*.cache” in line 46, but it could be any name you like) that contains the Message-ID headers of all uninteresting messages already received, this means that the current message is uninteresting too (because it is a reply to an already uninteresting message). If this isn’t the case, nothing happens. Otherwise, the Message-ID of the current message is added to the cache (line 46) and the message itself is archived in a dedicated mailbox (.processing.irrelevant_threads/ in my case, but it may be any other name, of course)

Since when I started using this recipe, I’ve seen a serious drop in the amount of mailing list messages that fill my inboxes. As soon as a thread starts and I flag it as uninteresting, the procmail recipe above takes over and sends all the replies somewhere else, out of sight! Thanks Sean! Of course, you still need to populate the cache file: I explain how to do that in a separate page.

Some things to consider before using this recipe

This is a wonderful recipe because it allows you to follow lots of mailing list you care without drowning in an endless flow of irrelevant stuff. However, there are some things you need to understand before using it:

  • the recipe only looks at Message-IDs to decide what is irrelevant because it’s a reply to something surely irrelevant. This means that if someone hijacks a thread that you labeled as uninteresting you won’t see that new discussion. Personally, I am fine with that, but you may think otherwise
  • for the same reason, that is the fact that the recipe only looks at Message-IDs and other standard email headers, it won’t work when you receive replies to irrelevant messages sent from braindead email software.

Last but not least, a word about killfiles. They are lists of email addresses of people whose messages you never want to see: when your email software gets a message from one of those addresses, it immediately discards it. The limits of normal killfiles on mailing lists (regardless of how you implement them) is that you still receive all the replies that all other subscribers send to the troll. This recipe could be easily be used as a super killfile: if you write a separate procmail recipe that immediately flags as uninteresting and then removes any message from JoeTroll, you won’t see anymore the replies to whatever he writes. I am on several mailing lists where doing so to one or two subscribers would cut all flames out, but I do not recommend this solution. Hiding problems with technology only delays them and makes them worst.

If there’s anything which is not clear in this page, please let me know!