Review, CentOS System Administration Essentials

5-dollar-promo (FIY, Packt offers this and many other titles at just 5 USD until January 6th, 2015 )

csea-cover I just finished to read a copy of “CentOS System Administration Essentials”, written by Andrew Mallett, which I got from the editor for review. Here is what I found.

Virdict: a good book, except a couple of (small) points

I have enjoyed reading this book, which I am going to call CSEA from now on for brevity. I think it is, indeed, a useful, synthetic tool for beginner system administrators. At 174 pages for the printed version, CSEA does provide only the essentials, as its title honestly says, but it does explain them well.

I only have two “negative” (note the quotes) things to say about CSEA, and none of them is serious. The first is about Chapter 1, titled “Taming vi”, which explains a few tricks of both vi/Vim and the Unix command line. In my opinion, those pages are not useful as a starter (even only a motivational one) for readers who had never seen those tools before, and too little for all the others. In other words, while there is nothing technically wrong in that chapter, it doesn’t add any value to the book.

The second “critique” I have about CSEA is something I found in the title itself: something good for the reader, after all, but potentially bad for the author and the editor. Rather than “CentOS System Administration Essentials” I’d have called this book something like “Learning Linux System Administration Essentials using Centos as a reference system“.

By this I mean that most of the content is valid on the great majority of GNU/Linux distributions, with the obvious exception of Chapter 4, which covers RPM and YUM. That’s why I said “good for readers”: CSEA is useful also for people who are not going to use Centos/RHEL or Fedora, and stays useful even for those users, if/when they move to another distribution.

The “potentially bad” part is, just because of what I already explained, the title seems misleading. People who judge a book only by its cover, I mean title, may not buy CSEA only because they are currently using another distribution, even if they would benefit from the book.

What’s in the book?

CSEA is explicitly aimed to people who are thinking to a career as Red Hat Enterprise Linux administrators, but as I said it is immediately useful to whoever wants to follow best practices in Linux administration of every flavour. In general, all chapters deliver what the book preface promises: clear explanations of some essential concepts, synthetic enough to make a quick read, and with just enough practical details to make further reading on each subject much easier. Here is the list of chapters, with a few comments where needed:

  • Taming vi: see above
  • Cold Starts: the GRUB boot loader, and how to customize its behavior
  • CentOS Filesystems: again, good content with a slightly misleading title. Permissions, hard and soft links, SUID and sticky bits, main features of BTRFS: all stuff that every Linux administrators must know, and valid on any distribution, not just Centos and its relatives
  • RPM packages and YUM: learn how to prepare your own RPM packages and local software repositories
  • Linux processes:** nice presentation of little-known precious tools as pgrep, pstree and pkill
  • User management: I liked the coverage of getent and quotas, as well as the “user creation” script
  • LDAP, or how to manage user accounts on many computers, from one single place
  • The Nginx Web server: basic configuration for a LEMP (Linux + “e”Nginx + MySql + PHP). Here I’d have liked a few more pages to include one complete, real world example, e.g. how to make WordPress run with nGinx, but it’s still a good chapter!
  • Configuration management with Puppet
  • Security: Pluggable Authentication Modules (PAM) and SELinux essentials, plus some password hardening tricks
  • “Graduation Day”, that is a summary of the whole book, and some extra best practices for SSH, Nginx and OpenLDAP

Review, The CentOS 6 Linux Server Cookbook

centos_cookbook_1 The CentOS 6 Linux Server Cookbook is a Packt Publishing title first published in April 2013. You can buy it in paper format (about 370 pages) or as an ePUB or PDF file (black and white only, whereas the ePUB version is in colours). In general I believe, especially in these times of PRISM and widespread economic crisis, that the more people learn how to run their own Free Software servers, the better. I’ve already explained how and, above all, why we should all do this with email and (at least) social networking and online publishing. That’s why, when Packt asked me to review the Cookbook, I accepted.

How is the Cookbook?

centos_cookbook_pdf The complete Table of Contents, which lists all the included recipes, is available on the Packt website, so I’ll just summarize it here. After chapters on installation and initial configuration, there are others devoted to:

  • Managing Packages with Yum
  • Securing CentOS
  • Working with Samba and Internet Domains
  • Running Database, Email, WWW and FTP servers

Almost all recipes have the same, four-part structure. After an introduction explaining the goal of the recipe, a “Getting Ready” section tells you what to read, check or do before applying it. The “How to do it” part is the actual recipe: a clear sequence of commands to type or things to write in configuration files. The “How it works” part answers the question “So what did we learn from this experience?”. It goes back to the beginning of the recipe and comments each single step again, adding many details and explaining why and how each instruction relates to the others.

Finally, many recipes also have a “There’s more” section, which describes corner cases or variants of the basic procedure. Some expert Linux users may find many “How it works” sections a bit too repetitive and/or filled with unnecessary details, if not just this side of boring. I consider this a likely possibility because… I had just that feeling myself, several times.

Then again, this is not a book targeting people who are already experts. This is a cookbook to get started quickly without doing dangerous mistakes, in order to become an expert, and it clearly says it at the very beginning:

rather than being about CentOS itself, this is a book that will show you how to get CentOS up and running. It is a book that has been written with the novice-to-intermediate Linux user in mind who is intending to use CentOS as the basis of their next server.

In that perspective, the “repetitions” are much more a feature than a bug. Besides, by being cleanly contained in the “How it works” sections they don’t really slow down readers who just need to learn some commands or refresh their memory with the details of some procedure, so I don’t mind them!

Looking at the recipes that were chosen to be in the cookbook, the initial chapters are very thorough. There is practically everything you need to install CentOS and get started with it. My only nitpick there is that I wouldn’t suggest people to run yum -y update before explaining, in another recipe, that the `-y` switch won’t ask for confirmation. Even Chapter 7, on DNS and BIND, has all the basic information.

Chapters of the last group (“Running … servers”), instead, are less complete, which is both… “bad” and good, for reasons I’ll explain in a moment. As far as the rest of the book goes, what is there is good: pertinent content, written as simply as possible. Things that, instead, are not in the Cookbook but should include more recipes on:

  • partitioning and backup strategies (even of databases)
  • SSH configuration
  • running virtual machines in a CentOS server
  • print services

If it were up to me, I’d trim the Install and FTP chapters (especially the latter!) to make room for recipes on these topics in the next edition.

About the “Running servers” chapters

The recipes in the last three or four chapters cover the minimum one has to do to get those servers up and running without hurting one’s users and the rest of the Internet in the process. They do it well, but proper configuration and administration of database, WWW and email services requires much more. While some potential readers may find it “bad” that the cookbook doesn’t have more on those topics, it is, instead, a good thing.

Almost all the configuration issues and other headaches that I did get over the years with my database, WWW and email servers were “internal” issues. Some were due to bugs in the software, many more to unusual requirements I had, or mistakes I did. In other words, they didn’t depend at all on what distribution those servers were running. This is why it is good that a CentOS cookbook doesn’t spend too much time on certain topics. You will have to go to other places anyway to make email or any LAMP CMS really usable, so why bother?

Is this a book worth buying?

Yes. All in all, I consider this Packt title quite a useful book for beginner CentOS server administrators. I use Centos myself on my personal Web and email servers. Even within the limits I just explained, If I had had such a cookbook when I first set them up, it would have saved me a sensible amount of time, simply for having most of what I needed to do in one place, all explained in one, consistant way.

Other reasons for buying a book like this are that CentOS and other Gnu/Linux distributions specifically developed for servers have both longer release cycles and less differences between them than environments like Fedora and Ubuntu. In other words, this is a book that will remain current more than many other ICT titles, and most of it would be usable even on other server distributions.

Two questions about (Free) software development, copyright assignment and non-commercial use

update 2012/07/13: I have realized only today that from March 2012 Scratch is also available with a GPL v2 license.

Recently there have been two separate discussions on two italian Free Software mailing lists, about the meaning and obligations of the license for the source code of the educational software Scratch. In both cases, I asked (and received) confirmation of my understanding of the license directly from the “Help@Scratch”. The discussions were about two topics that are, I believe, interesting and relevant for everybody considering usage or development of Free Software, especially (but not only) in educational and non-profit contexts. Therefore, I’m publishing the answers I got from the Scratch team here (with their approval), hoping they may be useful to everybody else with the same doubts in the future (but please see the disclaimer below!).

Does contributing your code to some FOSS project automatically give away your copyright on that code?

Point 4 of the Scratch source code license says:

copies or derivative works must retain the Scratch copyright notice and license

Reading this, some members of those mailing lists said that they would NOT use Scratch and would actively recommend others to NOT use because their understood the sentence above to mean something like: if you write extra code to modify or extend Scratch you HAVE to assign the copyright of their changes to them.

My understanding, instead, was that this means that if I modify the source code and (respecting all the other conditions) redistribute the resulting program:

  1. I must include with all the source code the original notice that says that copyright of the original code belongs to the original authors
  2. but copyright of the code I write remains to me. In other words, I do not need to assign the copyright of my modifications to the Scratch team to contribute.

In other words, I’ve always understood (even before the OpenOffice/LibreOffice saga confirmed this to me) that contributing your code to a FOSS project with whatever license doesn’t automatically means that the copyright of that code goes to the original licensor. The answer from the Scratch team confirmed this interpretation, in two separate messages:

the copyright belongs to whoever makes the modification. However, they are compelled by the terms of the license to make their code available to others under the Scratch license. So the modified code must be made available to others under the same terms that the original source was made available to the programmer that modified it. But that’s not the same as assigning the copyright to us.

A programmer of Scratch added his interpretation:

The author of any modifications to Scratch owns the copyright to any new code that they write. The copyright for their code is NOT automatically assigned to MIT. MIT could not, for example, include their changes in a Scratch release without their permission. However, under the "share alike" clause in the Scratch license, they must share the source code for their changes. "Share alike" clauses are quite common in open source licenses.

A derivative work is bound by all the terms of the Scratch license, including the non-commercial clause, so creators of modified or extended versions of Scratch cannot sell their derivative works without permission. However, anyone who modifies Scratch is allowed to use and their distribute derivative work for free or use it in other ways.

For example, the Open University created a modified version of Scratch that they hope will be used by over 10,000 students over the next five or six years. They are allowed to do that under the terms of the Scratch license.

Note that the Scratch source license is irrelevant to people who merely wish to use Scratch (which includes most students and educators). The license for the "binary" Scratch packages does not even include a non-commercial clause, so people can, for example, distribute it in an educational package or package it with a book

Can you be paid to develop software “for non commercial use only”?

The other topic of discussion was about another point of the Scratch source code license:

The Scratch source code license allows you to distribute derivative works based on the Scratch source code for non-commercial uses subject to the following restrictions...

The mailing list got stuck on this case:

  • suppose some public school, Public Administration (PA), NGO… decides it needs a modified version of Scratch for non commercial use, for example to use it internally, or to give it away for free, only to give free classes to kids
  • but the only programmer who’s skilled enough and available to modify the software wants to be paid by the school or NGO. Is this allowed by the license?

My (Marco) understanding is that if that school/NGO/PA pays the programmer to modify the Scratch code and release both the modified source code with the same license as Scratch and the executable program for free, they aren’t violating any license. Why? Because what I see as being used commercially here are only the skills and time of the programmer and any tool (e.g. compilers) he or she may use to modify the program, NOT the source code or binaries of Scratch (or any other software licensed with a non-commercial clause) in and by themselves.

Others, instead, said that the license means that nobody can ask or take money for building the derivative works, even if those derivative works will never be sold or used to make money.

When I asked again to “Help@Scratch” who was right, I got this answer:

You are correct. The non-commercial clause just means that you can't make a modification and then sell the finished product. It's totally fine to pay programmers to work on it... <code>[that clause is there because]code> especially in the early days of Scratch before it was well known, we wanted to be sure some company with a big marketing budget didn't take the code, change the About page, and then sell it to schools for $50 a license. We're a non-profit, and our marketing budget is $0, so we have to be careful about such things. :)

DISCLAIMER: I published this exchange for two reasons. First, I hope that it will help to clarify the meaning of “copyright assignment” and “non-commercial use” when software is concerned, especially to non native English speakers. In the second place, I hope to gather more feedback and info on these topics. This said, I am not a lawyer, and laws are different from country to country! Therefore, do not assume that this page is 100% accurate and complete! Ask a lawyer in your country if you need a definitive, up to date answer (and in such case, please let me know it!) Thanks.

How to download RSS feeds with a simple script

Background

Rss is a wonderful system to get headlines of online news from many independent sources and browse them as quickly as possible, without subscribing to any website, giving away personal information and/or depending on any third-party website to aggregate everything for you.

In order to save time and to not depend on any Rss reader, I have written two simple scripts. One downloads all the RSS feeds I want to read and saves them in a format suitable for further processing. The other reads that temporary file and generates one single HTML page with all the news titles and links. The format of the temporary file is very simple. It’s just plain text with three fields per line, separated by a “|” (pipe) character: feed name, article title and article URL. Here’s an example of that file:


  Repubblica|Pisani, difesa a oltranza "Non ho dato notizie a Iorio"|http://example.com/url_1.html
  Repubblica|Caffe, sigaretta, persino l'email cosi' la pausa diventa un privilegio|http://example.com/url_2.html
  Repubblica|L'ultimo trucco "ad aziendam" di Berlusconi il 'padrone' del paese corrompe la democraz
  ia|http://example.com/url_3.html


The real problem is how to generate that file, that is how to download, parse and reformat RSS from the command line. Here’s how I do it. It works almost perfectly, with one exception explained below, for which I ask for your help.

Rss downloader script

The simplest way I’ve found to download and parse Rss feeds is the Python feedparser module. Once it is installed, it only takes 15 lines of code to generate the list shown above:


   1#! /usr/bin/python
   2
   3 import sys
   4 import feedparser
   5 import socket
   6
   7 timeout = 120
   8 socket.setdefaulttimeout(timeout)
   9
  10 feed_name = sys.argv[1]
  11 feed_url     = sys.argv[2]
  12 d               = feedparser.parse(feed_url)
  13
  14 for s in d.entries:
  15    print feed_name + "|" + unicode(s.title).encode("utf-8") + "|" + unicode(s.link).encode("utf-8") + "n"


The scripts takes as argments the feed name and the RSS URL (lines 10 and 11). Line 12 is the one that actually downloads the feed and saves all its content in an object named “d”. The timeout in lines 7/8 is needed to not have the script freeze when some website is unreachable. The last two lines look at each element of the RSS object and print (together with the feed name) the title (s.title) and URL (s.link) of each entry. That’s it, really.

One little problem: encoding

As I said, the script works almost perfectly as is, and I hope you’ll find it useful. The only problem I haven’t solved yet is how to handle non-ASCII characters in URLs and, especially, news titles. As an example of what I mean, here’s what I get when I convert to HTML the three lines shown above.

python_encodingproblems

(in case it matters, this happens on Fedora 14 x86_64). As you can see, the accented letters are messed up. Similar things happen with quotes and other non-ASCII stuff. How do I fix this? Before I added the encode("utf-8") command it looked even worst (**), but there’s something still missing here. I have tried to figure out what, but I must say the relevant Python documentation isn’t so simple and easy to find (or recognize at least), so your feedback is very welcome. Thanks!

(**) this is why I believe that the problem is, and should be fixed, in the Python script itself and not in the other script that creates the HTML page, but I may be wrong. Regardless of this, I want to understand better how encoding is handled in Python

Major gaps of Open Office Impress versus Microsoft Power Point, what do you think?

Yesterday Sergio, a user of OpenOffice Impress, sent to the OpenOffice.org discussion list his list of the “Major Gaps of OpenOffice Impress 3.3 vs. Microsoft Office PowerPoint”.

Sergio compiled the list because, as much as he likes OpenOffice, “after struggling for over 1 year, sadly he had to stop using Open Office Impress and go back to Microsoft Power Point”.

Personally, I have experienced and can confirm most of what Sergio lists as “File Processing issues”. I haven’t encountered the other problems, but that may be because I use Impress very little these days, and I only need it for very simple slideshows. I don’t even know yet, for lack of personal experience, if and how the current LibreOffice version of Impress would be different. However, I am very curious to know if such differences exists. Above all, since I strongly support the OpenDocument format used by OpenOffice, LibreOffice and many other software programs, I want these issues to be solved.
Therefore, after speaking with Sergio, I’ve reformatted his report and put it here where it’s easier to find it than as a mailing list attachment, and easier to comment without subscribing to a mailing list. Your feedback is welcome!

Impress File Processing issues

  • Slow speed of processing even with high efficiency PCs (major problem !): Many tasks are performed very very slow !
  • Cutting slides: very, very slow
  • Copying and pasting slides from one impress file to another: very, very slow
  • Acquiring a slide change, even in the text: quite slow
  • Saving files: very slow
  • Opening files: very slow

Copy and past slides from one impress file to another

When a graphic is present in the slide layout, it gets deleted when the slide is pasted and copied in the destination file (major problem).
the color format of the slide in the source file gets changed when the slides gets pasted in the destination file (In PowerPoint, when you paste the slide in the destination file you are asked whether to retain the original format, including colours, layout graphic, etc.)

Changing page (slide) (to the following or to the previous one) in normal view

In “normal” view, it is not possible to shift easily to one page to the following or the previous one, using for instance the side scroll bar or the mouse scroll wheel. This is possible only when the zoom size of the page/slide very small, not with operative size. You have to necessarily click on the new slide into the left frame with the miniatures slides. This is very cumbersome.

Icons view

It is not possible to view all the icons of the formatting toolbar, unless you set a very large window size. Please allow to arrange the toolbar in 2 lines, even when it is integrated in the menu bar.
Please allow to change the order of icons within a toolbar.

Formatting in OpenOffice.org Impress

Bulleted list: I can’t set easily and automatically a space or a tab between each bullet and the first character of the paragraph (this option is present in “Open Office ” Word)

Increase or decrease indent of a paragraph or a bulletted list: I can’t let the icon left-to-right or right-to-left appear in the Formatting Toolbar, and therefore it is difficult to increase or decrease the indent (this option is present in OpenOffice Word)

Multiple selection of non-consecutive text: it is not possible, within the text in a same text cell, to select multiple, non-consecutive words or sentences or different non-consecutive sentences of a bulletted list (these options are possible in Open Office Word using “CTRL”),
Similarly, within a table, it is not possible to select multiple, non consecutive words, or sentences or cells (this is possible in Open Office Word using “CTRL”).

Formatting multiple text cells at the same time: after you select multiple text cells, the tool bar “Formatting” disappears. Therefore, you have to go to the Edit toolbar or right click and make one change at a time in the text format, which is very time-consuming.

Formatting tables: there is no way to select a column or a line just putting the cursor at the top of the column or before the line.

Changing the column width: putting the cursor onto one column border (starting from the second column from the left), clicking and dragging it in order to enlarge or reduce the column width: there is no way to retain the original width of the side columns (this is partly possible in Open Office Word by clicking at the same time the CTRL).

When the file is saved and re-opened, especially when an Impress file is saved as Microsoft PowerPoint and then re-opened as Impress file, tables gets often increased in line-spacing (very difficult to reduce back) and, consequently, in the overall height, so that they often get outside the slide (major problem!)

How to automatically replace files when updating WordPress

WordPress is quick and easy to install and update, but the quicker you can make these operations the better, right? If you have shell access to the server where your WordPress copy is installed, it is possible to perform all the operations in Step 1 of the Manual WordPress Update Procedure with the shell script below. It will save you a few minutes, which may seem too little, but is great if you maintain more than one copy of WordPress. That, however, is not the main reason to use a script like this. Its bigger advantage is reducing the possibility of human error by doing things by hand, at the prompt or with the mouse is the same.

The scripts takes two parameters: the first is the complete URL of the zipped WordPress files that you need in order to upgrade. The second is the directory, called WP_ROOT, where the WordPress installation you need to upgrade lives. You will note that in my script $WP_ROOT is only the name of that directory, not the complete path to it, which in the script is $HTML_ROOT/$WP_ROOT. I have do this because I have several, independent copies of WordPress running on the same server, all living in different subdirectories of my $HTML_ROOT, named with the name of each blog. Therefore, this choice of variables sounds more natural to me. When I want to update http://stop.zona-m.net, I will type `upgrade_wordpress.sh http://wordpress.org/wordpress-3.1.1.zip stop`. When I want to update http://strider.zona-m.net, instead, I will type “strider” instead of “stop”, and so on. Here is the script. Enjoy it, but remember to read the warnings at the end of this page first!


  #! /bin/bash
  # upgrade_wordpress.sh
  # This script performs automatically the operations described in the section
  # "Manual Update - Step 1" of http://codex.wordpress.org/Updating_WordPress
  # Copyright 2011 M. Fioretti http://stop.zona-m.net
  # Released under GPL V2 license
  #
  # $1 URL of latest WordPress version, zipped
  # $2 Root directory of the WordPress installation that must be upgraded

  WP_ROOT=$2
  HTML_ROOT=/var/www/html/wp
  TEMP_DIR=/tmp/temp_wp_update

  # Step 1.1 and 1.2: get latest WordPress, unpack it in temporary directory
  mkdir $TEMP_DIR
  cd    $TEMP_DIR
  wget  $1
  unzip -q wordpress*.zip

  # Step 1.3: Delete the old wp-includes and wp-admin directories
  rm -rf $HTML_ROOT/$WP_ROOT/wp-includes $HTML_ROOT/$WP_ROOT/wp-admin

  # Step 1.4: "upload" the new wp-includes and wp-admin directories
  cp -r -p $TEMP_DIR/wordpress/wp-includes $TEMP_DIR/wordpress/wp-admin $HTML_ROOT/$WP_ROOT

  # Step 1.5: "Upload" the individual files from the new wp-content folder
  # to your existing wp-content folder, overwriting existing files.

  cd $TEMP_DIR/wordpress/wp-content/
  find . -type f | sort  > $TEMP_DIR/new_wp_content_files_list
  tar cf $TEMP_DIR/wp-content-files-to-replace.tar -T $TEMP_DIR/new_wp_content_files_list
  cd  $HTML_ROOT/$WP_ROOT/wp-content
  tar xf $TEMP_DIR/wp-content-files-to-replace.tar

  # Step 6: "Upload" all new loose files from the root directory of the new
  # version to your existing wordpress root directory

  cd     $TEMP_DIR/wordpress
  tar cf $TEMP_DIR/loose_files_in_wp_rootdir.tar license.txt readme.html *php
  cd     $HTML_ROOT/$WP_ROOT
  tar xf $TEMP_DIR/loose_files_in_wp_rootdir.tar

  # Final step: remove temporary files, remember to check the configuration

  rm -rf $TEMP_DIR
  echo "you should take a look at the wp-config-sample.php file, to see if any new settings have been introduced that you might want to add to your own wp-config.php"



Warnings

If you reload the WordPress admin page just after running the script you’ll get the “let’s upgrade the database” button, and everything should be OK after that. I have tested this script myself to update both http://tips.zona-m.net/ and this website from WordPress 2.8.1 on a Centos VPS and as far as I can see it worked just fine.

As everything done with shell scripts, howeverm this thing is powerful enough to both save you time and to hurt your blog if there is some bug or if something changes in the official WordPress instructions before I notice it and update this page. Of course, the script is provided as is, without warranties, and you should (because you should do it anyway!!!) back everything up right before running it.

The script could have been even shorter. I deliberately wrote it in that way to have the same steps as the procedure described at WordPress.org, and make it easier to update if/when something changes in WordPress. If you find errors or ways to improve it, please write them as a comment and I will update the page as soon as possible. Thanks.

Dbmail? A great Open Source email system, especially for LAMP/MySql administrators

A couple of weeks ago, I was thinking about how I may build an advanced search utility for my own email archive. One way to make complex queries on the archive seemed to be to put it all into a relational database. Since the Dbmail system stores email in that way, I asked its developers and Harald Reindl (an email administrator at The Lounge who already uses Dbmail: I found him in the PostFix Mailing list archives) if Dbmail could be used in that way.

The feedback I got made me change my mind about how to rebuild my own email search system, for the reasons explained below. At the same time, how and why Harald and his company use Dbmail seemed really interesting. Here’s the story.

About doing complex email searches with Dbmail

Harald explained to me that:

you should only do searches in the mail client, via IMAP. The Dbmail database is not nice to search for messages because they splitted in all their mime-parts and many db-records. Therefore, even if your search is successful, it would be hard to get a complete message without studying the Dbmail sources. So, since the search is running over IMAP and with the capabilities of Imap you can’t do more complex searches, even if the backend is a relational database. However, searches can be faster, just because the messages are splitted and indexed on the other side. Even if Dovecot in the latest versions builds an index too, so I would not expect any difference. It is surely possible to make a backend in PHP or what else language to search in the Dbmail database, but be careful about references to not display messages not owned by the user who starts the search

Why you may want to use Dbmail

Harald: I chose Dbmail because it has a 100% Mysql-backend configuration and the possibility to have a synchronized backup-slave in the network, which you can stop everytime to make consistent snapshots for offsite backups without interrupting the mail server. We are using Dovecot as proxy in front of Dbmail for several reasons:

  • it supports more auth mechanisms than dbmail
  • it supports TLS/SSL directly
  • it supports replaces (% to @) since historically many users are configured with %
  • postfix supports dovecot directly for SASL-Auth, so you have the same auth-mechs and encryption options for pop3, Imap and smtp
  • security: I think it would be hard to exploit Dbmail through Dovecot (whereas exploiting directly dovecot seems harder, since it has only the user-logins)

We decide to migrate to Dbmail because we were running Apple servers (*brrr*) with Eudora mail server and I needed a replacement running on Linux/Vmware-hosts.

Since my main job is PHP/Mysql-developer, a full db-driven server gives me options to write special interfaces for all needs, doing cron jobs for notifies, cleanups, implementing auto-reply-backends and many nice things without touching text configurations.

Postfix is also nearly 100% Mysql-driven in our environment. There are great options for forwarders/aliases on both sides. If there is anything to do you only have to figure out which of both components can do what you want best, with the smallest side-effects on the whole system.

It was a really hard job to write a PHP-backend with 20.000 lines of code in few weeks, while learning much about mail servers. However, this has been running perfectly since the summer of 2009 with only a few “WTFs” and optimizations, but they are because of little know-how at the beginning.

This means after two months working day and night there was a complete solution, and for the second mailserver the whole virtual machine was cloned in 2010 and needed only minimal configuration. The Mysql replication is a big improvement for backups, here is how we use it:

  • VMware-ESXi-Cluster
  • Mailserver on one host
  • Clone of the first machine on the second
  • Replication between both of them
  • Replication is a separate Mysql instance, read-only port 3307
  • the replication can be used for Postfix as fallback, since readonly is enough for that
  • on the backup-machine a normal instance with a copy of the db is running…
  • …so you can start dbmail-imapd with this instance and directly access it via Thunderbird
  • once a week both mysqld are stopped and rsync-ed from replication to backup
  • before that happens the last backup goes to “mysql-last-week”
  • once per day the replication is stopped and a offsite-backup per rsync done

So we have permanently access to the mailbox versions from last sunday and we can switch a week back with a simple script and restore a customer with imapsync between both machines and have a daily backup on the other end of the city. And through all the time it takes to do this, the mail services are not down for one second. I would not know how to do this with a file driven mail server, because there are permanently files changed and nobody knows if the backup is clean enough if it is ever needed.

How to make a “multilingual” WordPress blog without multilingual plugins

Around October 2010 I migrated from Drupal to WordPress my bilingual websites Stop and Strider. Eighteen months later, WordPress has confirmed to be better than Drupal for my own needs, as far as those two websites are concerned (I still stick to Drupal for other websites).

There is, however, one part of those WordPress websites that has just become a big problem for me, and is how to keep them (looking) multilingual. I think I have found a solution for it and have explained it here for two reasons. First if I am right, this trick will be probably useful to many other WordPress users. The other is caution. I said I think I have a solution: I have done a little testing on a dummy installation (see below) and everything seems to work. However, before implementing it on the actual blogs, I’d really like to have some feedback from the community, just to make sure I am not forgetting something important (which I may have very well done).

When I started those two websites, I wanted them to have one name/base URL as simple and short as possible, so that I could just tell people “go to stop.zona-m.net” and then they would immediately find the version in their language, versus “go to {it,en}.stop.zona-m.net”. English content would have URLs like stop.zona-m.net/2010/11/article-title/, while all Italian content would have the /it/ suffix, as in stop.zona-m.net/it/2010/11/article-title/ and there would be automatic links from each version to the other. Drupal 6 does this, no problem.

Do you REALLY need a multilanguage blog?

When I moved to WordPress the best way to keep this architecture with one WordPress install seemed, and very likely is, the WPML plugin with English as primary language. After a while, however, I realized that WPML doesn’t seem to do all I need and had in Drupal. This may be my own fault but, for example, I found no way in WPML to:

  • use language-dependent widgets, ie how to tell WPML that a certain widget, e.g. an RSS feed, must be shown only in the English or Italian version of each page (update 2011/04/15: Amir, one of the WPML maintainers, points out in the comments to this post that this is now possible with WPML and it is explained here)
  • link to each other, as Drupal lets you do, the different names in each language of each category, in order to have automatic links between, for example, http://stop.zona-m.net/category/digiworld/ and its italian equivalent http://stop.zona-m.net/it/category/digimondo/, or to have automatic preselection of the right categories in the new target language when you translate an existing article.

In addition to this, I also realized something more important. Very likely, no matter how cool it looks, I don’t really need the links with the nice flags between english and italian versions of each page. By this I mean that my readers don’t really need it and almost never use it, because almost all of them only use the website in only one language.

Finally, at the beginning of 2011, WPML announced it would become a commercial/proprietary product. Putting all this together made me decide to find another solution.

Making two copies of WordPress look like one bilingual website

Those two websites run on a Virtual Private Server, on which I can add databases, change apache settings and so on as much as I want. This and all I explained in the previous section made me consider splitting each one of those websites in two independent, monolanguage WordPress installs. The result would be something a little bit less convenient to manage for me than it was with WPML, but nothing substantial, and almost transparent for all my readers.

As long, of course, as no URL changes in the migration! I want all the English URLs to remain the same (no problem) but I also do NOT want to “lose” or change the italian URLs. Here’s how I have achieved this on a test install, please tell me if there are weaknesses!

I have set up a test bilingual blog at bits.zona-m.net, with the same versions of WordPress and all the plugins as the real websites, in the /var/www/html/wp/tips directory, associated to the wp_tips MySql database and with this Apache server configuration:


  <VirtualHost *:80>
      ServerAdmin marco@digifreedom.net
      DocumentRoot /var/www/html/wp/tips
      ServerName tips.zona-m.net
      AccessFileName .htaccess
      CustomLog logs/tips.zona-m.net.access.log combined
      ErrorLog  logs/tips.zona-m.net.error.log
  </VirtualHost>


Then I created a few posts in it, both in English and Italian, with links to each other, to have something meaningful on which to test the following method.

In order to split that WordPress bilingual blog in two independent ones without breaking any URL already created, first I cloned its MySql database:


  # mysqldump -u root -p wp_tips > ~/dumb_wptips_bilingual.sql
  # mysql -u root -p
  > create database wptips_it;
  > use wptips_it;
  > source /root/dumb_wptips_bilingual.sql;
  > GRANT ALL PRIVILEGES ON wptips_it.* TO "sameuser_ofwptips"@"localhost" IDENTIFIED BY "password";


then I duplicated the whole, already working WordPress installation of tips.zona-m.net to another directory:


  cp -r -p /var/www/html/wp/tips /var/www/html/wp/tips_it


and modified the wp-config.php file in tips_it to point at the wptips_it database. Next, I modified the Apache configuration to “fetch” all and only the URLs starting with http://tips.zona-m.net/it from that second directory. In order to do this it is necessary to create an alias in the httpd.conf file:


  <VirtualHost *:80>
      ServerAdmin marco@digifreedom.net
      DocumentRoot /var/www/html/wp/tips
      ServerName tips.zona-m.net
      Alias /it /var/www/html/wp/tips_it
      AccessFileName .htaccess
      CustomLog logs/tips.zona-m.net.access.log combined
      ErrorLog  logs/tips.zona-m.net.error.log
  </VirtualHost>


and to modify the .htaccess files in the two folders as follows:


  # more /var/www/html/wp/tips/.htaccess
  <IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteBase /
  RewriteRule ^index.php$ - [L]
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteRule . /index.php [L]
  </IfModule>
  # more /var/www/html/wp/tips_it/.htaccess
  <IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteBase /it/
  RewriteRule ^index.php$ - [L]
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteRule . /it/index.php [L]
  </IfModule>
  #


At that point, “all” was left was to convince the cloned WordPress installation in tips_it and its MySql database wptips_it that they now live at another base URL with the it/ suffix, that is tips.zona-m.net/it. It seemed to me that the simplest/safest method in this case among those recommended in the official Change the WordPress URL page is the one called “Edit functions.php”. So I added these two lines to the functions.php file of the active theme:


  # cat /var/www/html/wp/tips_it/wp-content/themes/twentyten/functions.php
  <?php
  /*
  update_option('siteurl','http://tips.zona-m.net/it');
  update_option('home','http://tips.zona-m.net/it');
  */
  /**
   * TwentyTen functions and definitions
   *


then, as the WordPress page above recommends, I loaded the admin page a couple of times and immediately after I removed those two lines. Last but not least, I deactivated English in WPML in the “Italian” blog and Italian in the “English” one. Done! Apparently, everything works, as you can check by yourself by going to http://tips.zona-m.net and http://tips.zona-m.net/it. All the URL created when there was only one bilingual WPML/WordPress install are still valid with the same website URL: you can still visit both http://tips.zona-m.net/2011/04/here-we-go-with-my-second-english-post/ and http://tips.zona-m.net/it/2011/04/versione-italiana-del-secondo-post/, even if they are now served by different copies of WordPress. The only thing that has been lost, but as I said I do wonder if I really needed it, are the links between the several versions of each page.

If this works, I will migrate the real websites to this structure, deactivate/remove WPML completely, then upgrade both of them to the last version of WordPress, customized their layout separately and go on. The only things I should lose are, I believe:

  • possibility to handle posts in both languages from the same browser window. Worth the hassle for me, if it both maintains the old URLs and lets me customize each language at will
  • clear signs for Italian visitors that there is also an English version of each website and vice-versa. I feel that just adding a link in the top menu bar to the home pages of each language should be enough, since I’m pretty confident that almost nobody so far actually used both two languages
  • (totally wild guess, not sure if this really is an existing issue) search engines temporarily pissed off and de-ranking each post because it doesn’t have links from the version in the other language???

What do you think? Have I missed something? Is there any security hole or other weakness that will come and bite me if I migrate the real website? Should I do something else? Please let me know!

How to reject spam from certain countries (if you must really, really do it)?

Every now and then, a question like this pops up on some email server management forum:

  I'd like to be able to reject connections from remote IP addresses if they're from certain countries.

The usual reason is either that Continue reading How to reject spam from certain countries (if you must really, really do it)?

How to create stacked area graphs with Gnuplot

Gnuplot is a really great plotting utility, that can be used either interactively or automatically, from inside scripts of all sorts. However, sometimes it can be quite difficult to use simply because there are lots of documentation, but it is hard to figure out exactly what piece of documentation you should read and where it is.

This is a big problem, because the way you plot data, that is which Gnuplot options you set, can make a huge difference in the readability of the plot. For example, I had this ASCII file called initial_data.dat, that lists the number of daily visitors to four different websites I follow (the complete file used for this article is downloadable from the link at the end of this page):


  20110101 2481 89 896 98
  20110102 2341 83 1341 762
  20110103 560 208 1795 890
  20110104 936 409 1665 419
  20110105 534 562 937 341
  20110106 171 728 953 612
  20110107 200 199 1297 569
  20110108 521 557 990 295
  20110109 1592 227 535 466
  20110110 1363 211 21 299
  20110111 437 222 110 302
  ...


The file format is very simple: the first column is the date, then there are four space-separated columns, one for each website. However, if you tell gnuplot to plot those numbers as simple lines with these commands (the plot command here is wrapped around for readability, but it must stay all on one line!):



  set terminal png size 1400, 600
  set output "with_lines.png"
  set key center top
  set style fill solid
  set xdata time
  set timefmt "%Y%m%d"
  set format x "%b %d"
  set ytics 300
  set y2tics 300 border
  set xlabel "Website traffic from 20110101 to 20110322"

  set style line 1 linetype 1 pointtype 0 linewidth 1 linecolor 6
  set style line 2 linetype 2 pointtype 0 linewidth 1 linecolor 7
  set style line 3 linetype 3 pointtype 0 linewidth 1 linecolor 8
  set style line 4 linetype 4 pointtype 0 linewidth 1 linecolor 9
  plot ["20110101":"20110131"][:] 'initial_data.dat' using 1:5 t   
                                  "Website 4" w lines linestyle 4, 
   'initial_data.dat' using 1:4 t "Website 3" w lines linestyle 3, 
   'initial_data.dat' using 1:3 t "Website 2" w lines linestyle 2, 
   'initial_data.dat' using 1:2 t "Website 1" w lines linestyle 1


with_lines you’ll get the diagram on the left, that isn’t really informative: the lines continuously overlap, so it’s hard to see quickly which website had the highest traffic on each day. It’s even harder to see the total amount of traffic on each day. The solution is to stack the graphs, that is to use each line as the foundation on which to draw the others. In other words, instead of drawing four lines, one for each website, I wanted to draw:

  • traffic for website 1 using the numbers in the second columns, as they are in the data file
  • traffic for website 2 //summing its numbers in column 3 with the corresponding numbers in column 2
  • traffic for website 3 //summing its numbers in column 4 with the corresponding numbers in column 2 and 3
  • traffic for website 4 //summing its numbers in column 5 with the corresponding numbers in column 2, 3 and 4

It is possible to do this kind of stuff directly in Gnuplot, by calling utilities like sed or awk directly from within the Gnuplot command file as explained in this page. However, in my opinion it is much better and cleaner to do stuff like this before calling gnuplot, with an auxiliary script in Perl, Python or whatever strikes your fancy. The reason is that in this way you can pre-process the numbers in much more flexible, and self-documenting ways that Gnuplot is capable of: another way to say this is that it is better to keep number calculation and number plotting completely separate. For example, I also run an expanded version of the script below that calculates and stacks on the fly the moving averages over one month of the traffic to each website.

In this case, using a separate script to “stack” the data is also quick. All I had to do to rearrange the numbers as described above was to feed the data file to another script and save the result as another data file:

cat initial_data.dat | perl stack_data.pl > stacked_data.dat

Stack_data.pl is the simple Perl script below:


  #! /usr/bin/perl

  use strict;
  my $I;
    while (<>) {                            # read the data file from Standard Input
      chomp;
      my @FIELDS = split ' ', $_;           # put each column in a different
      my $DATE = shift @FIELDS;             # field of the @FIELDS array
      print "$DATE ";
      my $LINE_TOTAL = 0;
      for ($I = 0; $I <= $#FIELDS; $I++) {  # re-print the data, but making of
         $LINE_TOTAL += $FIELDS[$I];        # each column the sum of all the ones
         print " $LINE_TOTAL";              # preceeding it
      }
      print "n";                           # print the new numbers to Standard Output
   }


stacked_lines Here is what you get when plotting the stacked_data.dat file with the same style of the initial graph, that is with the same “plot” command. That is much better but, in my opinion, it wasn’t good enough yet. Personally, I think filled areas are more informative in cases like this. The way to obtain them is to use another style in the plot command. Replace the last line of the Gnuplot instruction file with this one (on ONE line!):


  plot ["20110101":"20110131"][:]
  'initial_data.dat' using 1:5 t "Website 4" w filledcurves x1 linestyle 4,
  'initial_data.dat' using 1:4 t "Website 3" w filledcurves x1 linestyle 3,
  'initial_data.dat' using 1:3 t "Website 2" w filledcurves x1 linestyle 2,
  'initial_data.dat' using 1:2 t "Website 1" w filledcurves x1 linestyle 1


with_filledcurves and this is what the plot will look like. Nicer, uh? Please note that the order in which each column is read and plotted in the command above is important. When Gnuplot draws a new line or area, it does it over what it has already drawn or paint. With lines (second diagram in this page) it makes no practical difference. With areas, instead, it’s esential to paint them in the right order. This means that you must plot first the highest area (“Website 4″ in our example) that is the one that is the sum of all the original columns. Then comes the one that is the sum of only the first three columns etc…