How to ignore uninteresting threads in mailing lists

2010-03-24 » Email, email filtering, procmail

Even in this age of social networking and instant messaging, mailing lists are very useful tools to get technical support or carry on public discussions online. The problem with mailing list is that, especially when they are very popular, you will soon find out that most of the traffic is irrelevant, because it’s either some flame war or some topic that has no interest for you.

This kind of email traffic is a big problem for two reasons:

  1. it is much harder to filter out automatically than spam (unless you use the trick below, of course)
    • Spam is much easier to recognize and stop before it enters your mailbox because it has a much more homogeneous structure. Uninteresting discussions, instead, are not spam: they are simply uninteresting!
    • you cannot filter out single subscribers of the mailing list. If the same person is engaging in a flamewar and in an explanation that you want to follow (a very common event on some lists…) you want to see nothing of the flamewar and all of the explanation in your mailbox
    • you can’t even write filters that do cancel all messages with uninteresting words in the subject: how do you know all those words in advance?
  2. (this is a consequence of the 1st problem) Since spam is very easy to filter automatically, if you follow many mailing lists you will soon end up receiving much more uninteresting messages that spam ones: in other words, this class of email makes you waste much more time than spam. This is one of the main reasons, in my experience, why many people are somewhat scared of mailing lists

Luckily, there is no reason to endure hundreds of uninteresting messages if you have the right tools. Gmail has a “kill thread” function, but you can get it with any email client (including webmail) if you can use procmail to filter your incoming messages. All you need is to add the recipe below to your procmail filters.

Years ago, I realized that irrelevant mailing list threads were wasting much more of my time than spam. When I asked for help on the procmail mailing list, procmail guru Sean Straw (SBS) wrote this recipe (see synthetic explanation below):

       1
       2    #============================================================================
       3    # simple recipe to ignore threads based on prior cache of threads to ignore.
       4    # 20061230, SBS
       5
       6    # get In-Reply-To messageid, check to see if it is in the ignore cache or
       7    # in the mua_ignore cache.  formail stores cache with ascii-z terminations,
       8    # but grep will still match the binary file.
       9    # if we have a match in the MUA id file or current cache, ADD the messageid
      10    # of THIS message to the cache, so that replies to it will also be ignored.
      11
      12    # ensure these are blank, not set to something you might have used them for
      13    # previously
      14    REFS=
      15    REFSNL=
      16
      17    :0
      18    * In-Reply-To:.*\/[^    ].*
      19    {
      20             # Assign the results to REFS
      21             REFS=${MATCH}
      22    }
      23
      24    :0
      25    * ^References:.*\/[^    ].*
      26    {
      27             # Append the results to REFS
      28             # no consideration as to whether REFS was null or not.
      29             REFS="${REFS} ${MATCH}"
      30    }
      31
      32    # by doing this ONLY if REFS contains non-whitespace, we spare
      33    # ourselves the overhead of the pipe chain invocation when it isn't
      34    # needed (i.e. messages with no references).  Arguably, REFS shouldn't
      35    # be set at all if the headers are empty, but this check is cheap to perform
      36    :0
      37    * REFS ?? [^    ]
      38    {
      39             REFSNL=`echo "$REFS" | tr -s "  " "\n\n" | \
      40                     sed -e '/^\([^<].*\|.*[^>]\|\)$/ d'`
      41    }
      42
      43    :0hc:ignore.cache$LOCKEXT
      44    * REFSNL ?? .
      45    * ? grep -qF "$REFSNL" ignore*.cache
      46    | formail -D 524288 ignore.cache
      47
      48    # if the preceeding conditions matched, then file this message
      49    # away as irrelevant.
      50    :0A:
      51    .processing.irrelevant_threads/

here is how it works. Lines 18-22 put the value of the In-Reply-To header in the REFS variable. Lines 25-30 do the same with the References header. In-Reply-To and References are two headers containing the unique identifier (Message-Id) of the message to which the current email is a reply. Lines 37-41 clean and reformat, so to speak, the REFS variable so it’s easy to use in the following comparison.

Lines 43-51 finish the job: if the content of the REFS variable is present in a cache file (cfr “ignore*.cache” in line 46, but it could be any name you like) that contains the Message-ID headers of all uninteresting messages already received, this means that the current message is uninteresting too (because it is a reply to an already uninteresting message). If this isn’t the case, nothing happens. Otherwise, the Message-ID of the current message is added to the cache (line 46) and the message itself is archived in a dedicated mailbox (.processing.irrelevant_threads/ in my case, but it may be any other name, of course)

Since when I started using this recipe, I’ve seen a serious drop in the amount of mailing list messages that fill my inboxes. As soon as a thread starts and I flag it as uninteresting, the procmail recipe above takes over and sends all the replies somewhere else, out of sight! Thanks Sean! Of course, you still need to populate the cache file: I explain how to do that in a separate page.

Some things to consider before using this recipe

This is a wonderful recipe because it allows you to follow lots of mailing list you care without drowning in an endless flow of irrelevant stuff. However, there are some things you need to understand before using it:

  • the recipe only looks at Message-IDs to decide what is irrelevant because it’s a reply to something surely irrelevant. This means that if someone hijacks a thread that you labeled as uninteresting you won’t see that new discussion. Personally, I am fine with that, but you may think otherwise
  • for the same reason, that is the fact that the recipe only looks at Message-IDs and other standard email headers, it won’t work when you receive replies to irrelevant messages sent from braindead email software.

Last but not least, a word about killfiles. They are lists of email addresses of people whose messages you never want to see: when your email software gets a message from one of those addresses, it immediately discards it. The limits of normal killfiles on mailing lists (regardless of how you implement them) is that you still receive all the replies that all other subscribers send to the troll. This recipe could be easily be used as a super killfile: if you write a separate procmail recipe that immediately flags as uninteresting and then removes any message from JoeTroll, you won’t see anymore the replies to whatever he writes. I am on several mailing lists where doing so to one or two subscribers would cut all flames out, but I do not recommend this solution. Hiding problems with technology only delays them and makes them worst.

If there’s anything which is not clear in this page, please let me know!