Why and how the OpenDocument format can save you a lot of time!

The OpenDocument Format (ODF) is an internationally recognized open standard for digital office documents whose importance has also been acknowledged by Microsoft. ODF is good for a lot of reasons I have already explained in Everybody’s Guide to OpenDocument. However, there is also one more reason why ODF is great for everybody who must produce a lot of office documents, one that will be the subjects of many posts on this website: ODF is really simple to generate or edit automatically. Even if you aren’t a professional programmer, it takes very little effort to put together a script that generates or processes in any way texts, presentations or spreadshets in ODF format.

How the openness of ODF makes automatic generation of documents much simpler

Very often, we use computers to produce many different versions, every time with new data, of some reference text, presentation or spreadsheet. Changing those kind of files manually makes sense only if it happens once in a while. When it’s a regular activity, instead, it can become a huge waste of time. ODF, however, makes it very quick and easy to insert raw data into texts, spreadsheets or presentations with the slightest possible amount of manual work and without even running OpenOffice. This is possible because an ODF file is just a ZIP archive, with pictures and macros in their own folders and the actual text written, in XML format, inside a file called content.xml. Therefore, in order to create a new, 100% compliant ODF file with different data, tables or images, you only have to open the archive, process the text inside content.xml or put new pictures in their folder if necessary and zip everything again. You must only use OpenOffice once, to create a template by hand if you if you don’t find a suitable one online.

The power of script-based ODF processing

You could perform repetitive generation and editing of ODF office files even manually, with a text editor like Notepad, Emacs or VI. The real power of ODF, of course, is in the fact that you can (and should) do all that processing automatically, with very simple shell or Perl scripts, that is with tools that are included in any Gnu/Linux distribution but can also work on Windows and Mac. The main advantages of this approach to office document processing are:

it works even without Openoffice, so it could even run on a server
there is no need of any relational database but you can use one if necessary
learning to do these things with shell scripts instead of OpenOffice macros:
gives you skills that you can reuse in a lot of other contexts.
allows very easy integration with other command line tools, from cron jobs to mass mailing, chart generation with Gnuplot or scaling, watermarking, framing of all images in a document with ImageMagick
above all, it’s much simpler (and faster) than you’d think!

The last point is the most important. Using the method explained here everybody with just a basic grasp of shell scripting can generate, modify or analyze hundreds of ODF text documents with just a few minutes of easy coding.

Of course, this approach is not really flexible, scalable or really robust, unless you add lots of code for error management, but the idea here is not to develop industrial strength solutions. If that’s what you need, you’ll have to either use real XML based tools like Odfpy or go straight to the source, the book OpenDocument Essentials by J. David Eisenberg, that you can also purchase at Lulu.com.

This said, there are tons of cases where heavyweight tools like those aren’t worth studying, installing and deploying, but people still end up wasting many hours on repetitive edits. Learning how to write quick and dirty shell scripts that can open and update an ODF file is an easy but huge time saver in such situations.

What’s next?

Here are some of the ODF scripting recipes that you’ll find on this website in the next days (but if there are other recipes that you would like to see published, just ask and if possible I’ll write them!):

create ODF text invoices from databases or plain text files
create multiple choice texts printouts, each with a different set of randomly chosen question
transform plain text outlines in simple presentations
fill spreadsheets with data taken from databases or system logs

(note: some of these posts are updated excerpts of articles originally written for Linux Format, and are republished here with their permission)