Major gaps of Open Office Impress versus Microsoft Power Point, what do you think?

Yesterday Sergio, a user of OpenOffice Impress, sent to the OpenOffice.org discussion list his list of the “Major Gaps of OpenOffice Impress 3.3 vs. Microsoft Office PowerPoint”.

Sergio compiled the list because, as much as he likes OpenOffice, “after struggling for over 1 year, sadly he had to stop using Open Office Impress and go back to Microsoft Power Point”.

Personally, I have experienced and can confirm most of what Sergio lists as “File Processing issues”. I haven’t encountered the other problems, but that may be because I use Impress very little these days, and I only need it for very simple slideshows. I don’t even know yet, for lack of personal experience, if and how the current LibreOffice version of Impress would be different. However, I am very curious to know if such differences exists. Above all, since I strongly support the OpenDocument format used by OpenOffice, LibreOffice and many other software programs, I want these issues to be solved.
Therefore, after speaking with Sergio, I’ve reformatted his report and put it here where it’s easier to find it than as a mailing list attachment, and easier to comment without subscribing to a mailing list. Your feedback is welcome!

Impress File Processing issues

  • Slow speed of processing even with high efficiency PCs (major problem !): Many tasks are performed very very slow !
  • Cutting slides: very, very slow
  • Copying and pasting slides from one impress file to another: very, very slow
  • Acquiring a slide change, even in the text: quite slow
  • Saving files: very slow
  • Opening files: very slow

Copy and past slides from one impress file to another

When a graphic is present in the slide layout, it gets deleted when the slide is pasted and copied in the destination file (major problem).
the color format of the slide in the source file gets changed when the slides gets pasted in the destination file (In PowerPoint, when you paste the slide in the destination file you are asked whether to retain the original format, including colours, layout graphic, etc.)

Changing page (slide) (to the following or to the previous one) in normal view

In “normal” view, it is not possible to shift easily to one page to the following or the previous one, using for instance the side scroll bar or the mouse scroll wheel. This is possible only when the zoom size of the page/slide very small, not with operative size. You have to necessarily click on the new slide into the left frame with the miniatures slides. This is very cumbersome.

Icons view

It is not possible to view all the icons of the formatting toolbar, unless you set a very large window size. Please allow to arrange the toolbar in 2 lines, even when it is integrated in the menu bar.
Please allow to change the order of icons within a toolbar.

Formatting in OpenOffice.org Impress

Bulleted list: I can’t set easily and automatically a space or a tab between each bullet and the first character of the paragraph (this option is present in “Open Office ” Word)

Increase or decrease indent of a paragraph or a bulletted list: I can’t let the icon left-to-right or right-to-left appear in the Formatting Toolbar, and therefore it is difficult to increase or decrease the indent (this option is present in OpenOffice Word)

Multiple selection of non-consecutive text: it is not possible, within the text in a same text cell, to select multiple, non-consecutive words or sentences or different non-consecutive sentences of a bulletted list (these options are possible in Open Office Word using “CTRL”),
Similarly, within a table, it is not possible to select multiple, non consecutive words, or sentences or cells (this is possible in Open Office Word using “CTRL”).

Formatting multiple text cells at the same time: after you select multiple text cells, the tool bar “Formatting” disappears. Therefore, you have to go to the Edit toolbar or right click and make one change at a time in the text format, which is very time-consuming.

Formatting tables: there is no way to select a column or a line just putting the cursor at the top of the column or before the line.

Changing the column width: putting the cursor onto one column border (starting from the second column from the left), clicking and dragging it in order to enlarge or reduce the column width: there is no way to retain the original width of the side columns (this is partly possible in Open Office Word by clicking at the same time the CTRL).

When the file is saved and re-opened, especially when an Impress file is saved as Microsoft PowerPoint and then re-opened as Impress file, tables gets often increased in line-spacing (very difficult to reduce back) and, consequently, in the overall height, so that they often get outside the slide (major problem!)

How to automatically print or convert to PDF, MS Office or other formats OpenDocument files

The script and tricks in the ODF scripting section of this website show how to create office-ready texts, presentations and spreadsheets automatically, in the OpenDocument format, which is a worldwide standards. This is all many people need to work today. Sometimes, however, it’s still necessary to either print those documents, or exchange them to somebody in other formats, like PDF or those of the older releases of Microsoft Office (newer releases of this program are already partially compatible with OpenDocument through free plugins, so if your partners have those versions they should really use those plugins, instead of bothering you with requests for drug-like, legacy file formats, but that’s another story).

Of course, if you only need to print or convert to other formats only once in a while there’s no reason to not do it from OpenOffice. The simple tricks explained below, however, are a life-saver when you need to do this many times, and of course you’d like your computer to do it for you while you have a coffee or something.

On Linux systems it is easy to do all this, and even send the converted files via email, automatically. Let’s assume that you have an OpenDocument text, spreadsheet or presentation already sitting in some folder, waiting to be processed.

Both printing and conversion to PDF, HTML or MS Office formats from the command line need OpenOffice to work. In the second case, the reason is that what makes the actual work is one of the OpenOffice macros linked below: when you launch OpenOffice, it executes that macro on the file indicated by the user and then exits. Macros are not needed for printing because OpenOffice has dedicated options for that. Usage of OpenOffice from the command line is explained on the OOo wiki. In a nutshell, this is the correct syntax:


  soffice -invisible macro://path-to-macro($FILE)


On some systems, you may need to provide the complete path to the soffice program. The -invisible option is what makes OpenOffice start without a graphical interface. The file to process must be passed as argument ($FILE) to the macro.

The command above is all you need if you are working on a complete Gnu/Linux desktop, that is a system that also has a graphical interface server (called X server). For the record, you can do the same thing in Windows with a batch script like this (taken from an OOoforum thread):


  @ECHO OFF

  "c:program filesOpenOffice.org1.1.4programsoffice" -invisible "macro:///Standard.Module1.ConvertToPDF(%1)"


When you want to work inside a Linux Web or print server, instead, that is on a computer where X was never installed, you need to set up some extra variables before launching OpenOffice, otherwise it won’t start. This is how to do it (the explanation for the extra commands are in the thread in which I found them, which also includes instructions on how to install OpenOffice on a (remote) server:


  export PATH=$PATH:/usr/bin/X11
  export LANG=en_US
  export HOME=/var/www
  xvfb-run -a /usr/bin/soffice -invisible macro://path-to-macro($FILE)


Please note the extra piece in the actual command, that is in the last line above: `xvfb-run -a`. Xvfb is a smaller X server used in special situations like this, when a full X wouldn’t be installable. Also, don’t forget that, depending on the server configuration and your actual needs, you’ll probably have to change the LANG and HOME variables.

Show me the macros!

The previous paragraph explains how to run OpenOffice from the command line on Linux or Windows in order to execute any macro. Let’s now look at the actual macros we need to print or save in Microsoft or other formats. There are several ones available online.

Those with the best explanation, which includes details on how to install any macro in OpenOffice, are SaveAsPDF and SaveAsDoc. The beauty of these macros is that it is very easy to modify them to save in HTML or any other format that OpenOffice can handle! You just have to substitute the right values for the file extension (MYEXTENSION) and the filter name (MY_FILTER_NAME) in this part of the macro:


   cFile = Left( cFile, Len( cFile ) - 4 ) + ".MYEXTENSION"
     cURL = ConvertToURL( cFile )

     oDoc.storeToURL( cURL, Array(_
              MakePropertyValue( "FilterName", "MY_FILTER_NAME" ),)


Another macro that saves an OpenDocument file in PDF format was posted to the Fedora mailing list. Whichever macro you choose, put it in a suitable folder, accessible from the script and user account that will use it, and replace the path-to-macro string above with the actual full path to the macro in the file system.

How to print or email OpenDocument files from the command line

In order to do this we just need two other command line options of OpenOffice (see here for the complete list or type `soffice -?` at a command prompt to get a complete listing):


  soffice -invisible -p <documents...>
  soffice -invisible -pt <printer> <documents...>


They both print all the specified documents. The only difference between them is that the first one uses the default printer, the second looks for the printer given as first parameter.

Finally, if you also want your script to email on your behalf the files that it generated in this way, you can use the text-based Mutt email client in this way ($EMAIL_TEXT is a separate text file containing the text of the message):


   mutt $RECIPIENT  -s "$UBJECT" -i $EMAIL_TEXT -a $FILE_TO_BE_ATTACHED


if you find any error in this page or have any suggestion, please tell me (but remove the numbers from the email address first!)

Generate OpenDocument spreadsheets from DB2 (or any other) database

DB2 pureXML is IBM software for management of XML data that eliminates much of the work typically involved in the management of XML data.The OpenDocument Format (ODF) is an open international standard for office texts, presentations and spreadsheets that is very simple to process or generate automatically. This page is a short synthesis of an article published in September 2010 by N. Subrahmanyam, Using DB2 pureXML and ODF Spreadsheets, to give an idea (see my comments at the end) of how flexible ODF scripting is. Please read the original full article to know how to actually generate ODF
documents from DB2 pureXML files
.

Selected quotes from “Using DB2 pureXML and ODF Spreadsheets”

  • We will… demonstrate how to use DB2 pureXML with ODF documents. In the remainder of this article we will consider an ODF spreadsheet of a simple shopping list containing item names, units of measurement (UOM) and the number of units required.
  • We are considering only spreadsheets because it is easier to co-relate the rows and columns of the spreadsheet document and that of a DB2 table… (but) other ODF documents such as word documents, presentations, formulas, etc. could also be handled by DB2 because the ODF specification refers mainly to an archive of XML documents.
  • We will demonstrate how to:
    1. Import the content.xml document of this ODF spreadsheet into an XML column within a DB2 table
    2. Extract information from the content.xml document into relational format.
    3. Merge data from a relational table with the content.xml document in the XML column to produce a new spreadsheet.
  • One way of building the content is to start from scratch and building the whole (ODF file). This approach would be quite tedious. A better way would be to generate the relevant part in the content.xml part and then insert into an otherwise empty content.xml document. This approach will also allow us to make use of the XQuery Update facility in DB2 pureXML.
  • While generating the spreadsheet, one can write queries to generate the meta.xml document with correct values for creator, creation time-stamp, etc
  • This article can even be further extended into an on-line document editing software with .ods as file format and DB2 pureXML as the database.

What does all this mean?

Me, I have never used DB2. In spite of this (or maybe just because of this?), there are at least three things of general interest in Subrahmanyam’s piece that I want to point out. The first is that the second bullet above applies to any kind of database, even if XML-oriented ones like DB2 may make the job easier. The second is that all the real work to generate the ODF spreadsheet, as you can check yourself if you read the article is done directly inside the database, that is through DB2 database queries. The only exception is simple housekeeping stuff, like saving everything into a file and properly insert that file inside the (ODF) ZIP archive that constitutes the spreadsheet. If you already know DB2, there is no need to learn some other obscure language or super-complicated format (the whole article is 16 pages including abstract, bio, resources, full source code and 6/7 pictures). In other words,
this is another proof of how flexible and useful the basic ODF scripting approach is.

Last but not least, please look again at the last bullet above: there’s no reason why this kind of easy ODF hacks should remain inside your own organization. If you want to generate dynamically from a database, for all the visitors of your website, spreadsheets and other office documents in an editable format that is immediately usable by many office suites (including Microsoft Office), ODF is the way to go. Thanks to N. Subrahmanyam for this article!

Create OpenDocument invoices and other documents with Rexx

After my talk about ODF scripting at OOoCon 2010 I got by another OOoCon speaker, Rony G. Flatscher another script for automatic generation of OpenDocument invoices, or any other ODF text with a fixed structure. Roni’s script opens an OpenDocument text template as the one shown in the left picture below and replaces all the MF_ placeholders strings with values loaded by a plain text file, creating the filled form shown in the right picture.

This is the format of the plain text file containing the data that must be inserted in the form (of course, you could either generate this entry file manually, but the real advantage of doing things this way is when the data come from some database, spreadsheet or other similar source:

  ; empty lines or ones starting with a ; are considered comments
  ; field separator is the string /;

  ; MF_NAME          MF_STREET           MF_ZIP      MF_CITY     MF_COUNTRY
  Marco Fioretti    /; Via Gioiosa 43 /; I-19372  /; Rome        /; Italien/Italia
  Rony G. Flatscher /; Augasse 2-6     /; A-1090   /; Wien        /; Austria

Unlike mine, Roni’s script is written in the Rexx language and uses the Java-odftoolkit. The way to run the script from the command prompt or within another script is the following:

rexx test4marco-fioretti.rex test4marco-fioretti.odt test4marco-fioretti.txt

the first argument after the invocation of the Rexx interpreter is the actual script (which is barely over 100 lines, and half of them are comments! You can read the full source at the bottom of this page), the second is the template file (the one in the left picture above) and the last one is the data file. You can download the Rexx script, the sample data file, the OpenDocument template and the resulting document here. I haven’t been able to test the script myself yet (I’ve messed a bit too much with my system lately and will have to clean it up…) but it looks fine to me and of course I’ll update this page with any bug fix or missing info that should be needed. Here is Roni’s explanation of what his script does:

Rony: following up our little conversation at OOoCon I researched today the Java-odftoolkit and tried to come up with a simple solution that mimickries your approach to changing the content of ODF files. This ooRexx program does basically the same as your bash/Perl scripts, but takes advantage of the Java-odftoolkit without using Java’s strong typing (blessfully). I have added in the comments at the top of the script the instructions to install the required libraries on Linux and Windows in case they aren’t on your system yet. If you’re unfamiliar with the language, just remember that the tilde is ooRexx explicit message operator (left of the tilde is the receiving object, right of it is the name of the message, which may be followed by round parenthesis which contains arguments). Last but not least, should you experience problems, please do not hesitate to contact me (but remember to remove all the anti-spam digits from the email address!!!).

Isn’t OpenDocument wonderful?

I’m very happy to host here Roni’s script and explanation, because he’s proving what I thought when I started the ODF scripting section of this website and explained in my OOoCon talk: the OpenDocument format is not only really open, it’s also so simple that everybody can save lots of tedious, manual office work thanks to it, no matter which language he or she prefers! If you have other examples of OpenDocument scripting, please let me know, I’ll be glad to host them here or link to them!

/*
   Author:  Rony G. Flatscher
   Purpose: Demonstrate another possibility to script ODF using ooRexx with the
            Java camouflaging support of BSF4ooRexx (makes Java typeless and look
            like ooRexx)

   Inspired by Marco Fioretti's presentation at the 2010 OpenOffice Conference in Budapest,
            cf. <http://www.ooocon.org/index.php/ooocon/2010>

   Needs:   - ooRexx (FOSS):
              - Linux: a build of ooRexx with a revision >= 6133, from <http://build.oorexx.org>
              - Windows: <http://www.oorexx.org/download.html>

            - BSF4ooRexx (FOSS):
              - <http://wi.wu-wien.ac.at/rgf/rexx/bsf4oorexx/current/>
              - read the "readme*.txt" file of your platform for installation instructions

            - the Java odftoolkit 0.8.6 as of July 2010, downloaded from:
              <http://odftoolkit.org/projects/odfdom/downloads/download/current-version%252Fodfdom-0.8.6-binaries.zip>

              directions:
                unzip -j odfdom-0.8.6-binaries.zip odfdom-0.8.6-binaries/odfdom.jar
                ... will extract the "odfdom.jar" Java archive (a zip archive), which needs
                    to get added to the CLASSPATH environment variable for Java to find the
                    contained classes

                Linux:
                   - export CLASSPATH=`pwd`/odfdom.jar:$CLASSPATH

                Windows:
                   - set CLASSPATH=%cd%odfdom.jar;%CLASSPATH%

            - the Java XSLT processor "Xerces" from the Apache Software foundation, downloaded from:
              <http://www.ecoficial.com/apachemirror//xerces/j/Xerces-J-bin.2.10.0.zip>

              directions:
                unzip -j Xerces-J-bin.2.10.0.zip xerces-2_10_0/xercesImpl.jar
                ... will extract the "xercesImpl.jar" Java archive (a zip archive), which needs
                    to get added to the CLASSPATH environment variable for Java to find the
                    contained classes

                Linux:
                   - export CLASSPATH=`pwd`/xercesImpl.jar:$CLASSPATH

                Windows:
                   - set CLASSPATH=%cd%xercesImpl.jar;%CLASSPATH%

   To run:  rexx test4marco-fioretti.rex test4marco-fioretti.odt test4marco-fioretti.txt
*/

parse arg odtFilename customerFilename . -- get command line argument, make sure last argument is stripped as well

   -- load the Java class "OdfPackage" and use it for manipulating the ODF-files
odfPackageClz=bsf.loadClass("org.odftoolkit.odfdom.pkg.OdfPackage")  -- import the Java class object
odfPackage   =odfPackageClz~loadPackage(odtFileName)  -- create and fetch a package object

   -- read content.xml file
packagePath="content.xml"              -- file from the package
mediaType  =odfPackage~getFileEntry(packagePath)~getMediaType  -- needed for save operation later
bytes=odfPackage~getBytes(packagePath) -- get stream as a byte array (as raw bytes)
rexxString=BsfRawBytes(bytes)          -- convert Java byte array to Rexx string of raw bytes/octets

   -- read and parse customer file
a=readCustomerFile(customerFileName)   -- returns an array of directories, each representing a parsed customer

   -- get and set todays date
date=.dateTime~new~europeanDate("/")   -- get current date in dd/mm/yy (use "/" as a delimiter)

   -- create as many odt-files as we have customers in the txt-file
loop i=1 to a~items        -- loop over array
   newString=changeValues(rexxString,a[i],date) -- carry out the changes
   newBytes=BsfRawBytes(newString)        -- convert Rexx string to Java byte array
   odfPackage~insert(newBytes,packagePath,mediaType)  -- write back data (replaces old content!)
   odfPackage~save("new_"i"_"odtFileName) -- save under a new name
end
odfPackage~close                       -- close the package object, we are done

::requires BSF.CLS   -- load the ooRexx Java support (make Java typeless and look like ooRexx)

::routine changeValues        -- carries out the replacements in the content.xml text
  use arg string, dir, date

  mb=.MutableBuffer~new(string)  -- use the ooRexx MutableBuffer in case we have plenty of string changes
  mb~changeStr("MF_DATE",    date)
  mb~changeStr("MF_NAME",    dir~MF_NAME~strip)
  mb~changeStr("MF_STREET",  dir~MF_STREET~strip)
  mb~changeStr("MF_ZIP",     dir~MF_ZIP~strip)
  mb~changeStr("MF_CITY",    dir~MF_CITY~strip)
  mb~changeStr("MF_COUNTRY", dir~MF_COUNTRY~strip)
  return mb~string               -- return a plain string

::routine readCustomerFile    -- reads and parses a customer file
  use arg filename

  deli="/;"      -- delimiter string for columns
  a=.array~new    -- array to collect the parsed lines from the file
  loop while chars(filename)>0   -- as long as there are unread characters loop
     line=linein(filename)
     if line="" | left(line,1)=";" then iterate    -- a comment or empty line, ignore it
     d=.directory~new   -- directory to contain the parsed values of the individual lines
     parse var line d~mf_name (deli) d~mf_street (deli) d~mf_zip (deli) d~mf_city (deli) d~mf_country
     a~append(d)  -- save directory object in array
  end
  return a        -- return array of directory objects, each representing a customer

OpenOffice.org Conference 2010, preparing the next ten years

The OpenOffice.org conference celebrating the tenth birthday of OpenOffice.org started in Budapest yesterday morning. Here are some first notes from the field.

The opening session was a cool moment, both for the location (the Hungarian Parliament) and for the content. We started in the very hall of the Parliament. Incidentally, the first thing I noted there has nothing to do with OO.o but is a general problem of the FOSS and programming worlds: of about 150 people in the hall, no more than 10% were women, even if OO.o and FOSS users aren’t certainly 90% males, are we? But I digress.

The official conference brochure starts with a welcome message of the ODF Alliance Hungary noting “how appropriate it is that such a conference takes place in a region filled with resonance from the collapse of the totalitarian regime of the former Eastern block” that is an example of closed society replaced by a much more open one, and in a University that is based on an open society vision, open to improvement whose values evolve through trial and error, just like OSS. Dr Zsolt Nyitrai, State Secretary of the Ministry of National Development, greeted participants pointing out how the current government of Hungary fully acknowledges the importance of FOSS and the OpenDocument Format (ODF), proved by several practical experiences:“We encourage you to help us to open the closed doors of administration in the world of Open Source office software” (see also the official press release). Professor Rev also gave a really interesting speech, but since it’s much more general than OO.o or FOSS, I’ll write about it on another website.

The day before the conference, some other participants had told me they were expecting with great interest the other keynote of Michael Benner, vice president, Oracle Office GB: “let’s hear what Oracle wants to do with OpenOffice.org…”. The answer, as far as Benner opening speech goes, was reassuring:

  • OpenOffice.org is a good fit for this company
  • Oracle OO will be highly integrated with other Oracle products
  • Our business units offering include:
    • Oracle OpenOffice server
    • Oracle ODF plugin for MS Office
    • Oracle premier support for the products above and for OOo
    • more to come…

Florian Schiessl explained what made the Munich’s conversion to OpenOffice.org work: maniac attention to detail and patience. They looked at some 21000 different templates and macros one by one and converted each of them manually, but only when they were sure they couldn’t be abandoned, eventually reducing their number of about 40%. More info is at Wollmux. They had problems when they sent ODF files to other organizations that had never seen them before, but Schiessl’s suggestion is “do talk with your partners when they refuse ODF and there will be good results and simplification for everybody, for example like using MS formats, but abandoning MS-only macros because they were not necessary in the first place”. (of course, being one of the largest cities in Europe helps a lot in this approach… single users still have less
opportunities to be heard
).

Miklos Banai of ODFA Hungary closed the morning with a very interesting question: “Europe has a bigger population and GDP than the USA, yet Microsoft revenues here are around 10 billion usd/year, with a operating income/profit around 7 BUSD/year. This with OOXML that seems a standard of an artificial world of robots with artificial intelligence. I wonder if there is any human able to rebuild it with only the 6000+ pages of the OOXML spec… Is this the best deal for Europe?” For these reasons, Banai concluded, “the European Union should change for a younger, more valuable economical, flexible and secure solution for document creation”.

OpenOffice or OpenDocument?

By looking at the conference program one may wonder “is this an OpenOffice or OpenDocument conference?” Being there, I can confirm that, even if OO.o surely remains at the center of the stage, there is indeed a lot of interest in the OpenDocument format in and by itself, even outside of the single talks devoted to it. Louis Suarez-Potts, OO.o Community Development Manager, pointed out how in the next years it’s important to focus on ODF, since focusing only on OO.o doesn’t go very far, especially in Government circles. It is also crucial, he said, to make easier for more programmers to join development and to keep the whole community self-sustainable. Size in and by itself doesn’t really mean much (“think how General Motors ended”, Louis noted). What matters is “to not rely on any single company or language group: “a global community does not privileges one language, one nation…”. Louis final comment on the state of OO.o was:
The first 10 years were only setting the stage and clearing our throat. Real action starts now”
.

I’ve seen many cool things in the first two days of the conference. One I liked a lot were the free and commercial extensions developed by EuroOffice. They include map-based charts, interaction with GoogleEarth, an education tool to generates interactive diagrams with orbits and other informations about planets and eco-friendly printing (still experimental) that erases backgraounds or large images and changes text color to black. The best one for me is the Planet tool, because it proves something I really want to investigate in the next months: the potential of OpenOffice.org as an educational platform, that is a tool to build interactive courseware.

Speaking of ODF

I spent day 2 of OOOcon only looking at, or speaking about… the OpenDocument format. In the ODF interoperability demo, Inge Wallin of KOffice created a letter with KWord, associating to his own name in the text his phone number, hidden into an RDF variable. He then sent the letter by email to a colleague who, in real time, opened it on his smartphone with FreOffice. Working in RDF mode, Freoffice realized that “Inge” wasn’t a normal string, showed his phone number and, after one click on it, Inge’s cell phone started ringing at the other side of the table. Rob Weir showed how Mathematica can generate math formulas directly usable in OpenDocument while Jos van der Oever suggested that future versions of ODF may move to the Web, using JavaScript for macros and CSS for styiling. When I was asked what I’d like to see in future versions of ODF, my gut reaction was “please leave it as it is, it’s already good enough!”. On a second thought, I’d like to see
ODF do what the Universal Business Language was developed for (support automation of B2B financial transactions) and sentence-level cross-referencing, to stop saying things like “look at the 3rd paragraph on page 20″ in an era where documents are often not printed, but displayed on screens of all possible sizes.

Another very interesting moment of the day was the “Building Bridges” talk by Moritz Berger of Microsoft. He explained why he thinks that it is wrong to promise 100% roundtrip fidelity (and I fully agree with him here) but there are plenty of good reasons to keep using both OOXML and ODF, that is two standards for the same type of files. He also explained MS ODF imlementation priorities in Office 2007, 2010 and beyond. They are, from first to last:

  • adhere to ODF standard
  • be predictable
  • preserve user intent
  • preserve editability
  • (last) preserve visual fidelity

Me, I explained why I’m sure that ODF scripting is both a simple, huge time-saver and a good way to convince more people to use OpenDocument and OpenOffice.org. In the next days, both my talk (and more first-hand news from OOOcon 2010) will be posted here, so stay tuned!

How to make OpenDocument slideshows out of plain text files

Slideshows are extremely popular as presentation and educational tools, but have a couple of serious problems. The first is readability: let’s admit it, many slideshows are almost unusable. One of the secrets to useful slideshows is terseness. Each slide should contain only a few short points or pictures which summarize the key concepts you want to transmit to the audience with that part of your talk.

The other big issue with slideshows is that GUI presentation software, be it PowerPoint, OpenOffice Impress, KPresenter or anything else, can be quite time-consuming and distracting, no matter how you use it. Writing bullets and sub bullets as simple text outlines is much faster, even when you’re just pasting together notes you scrabbled on your PDA, email fragments, quotes from Web pages or thoughts of the moment.

If you need to produce slideshows and think that the cleaner they are the better, but don’t like the time it takes to put them together in a GUI, here’s a solution. Like any other ODF document, OpenDocument slideshows are very easy to generate and process automatically. Besides, using the approach below instead of LaTex and friends has one big advantage: the end result is a file that you can pass around to everybody, including users who can only handle traditional office suites and maybe need to edit the slides, but wouldn’t touch any manual markup with a ten feet pole.

This said, there is one big difference between this kind of slideshow processing and the tricks in my other articles on ODF scripting: you will probably need to fix something manually, unless you improve the scripts found here or all your slides can always have the same fixed number of bullet points, each with the same, more or less constant number of words. It’s practically impossible for a few quick scripts to make all slides look good without some manual tweaking here and there. Even in this case, however, the whole process may still take much less than typing by hand the content of all the slides in Impress.

Practical example of automatically generated ODF slideshow

Here’s what I’m talking about. This picture on the left shows the initial template: the one on the right shows the result, that is the filled slide you’ll get by running the scripts explained below on this plain text source (shameless self promotion: these are the conclusions of my essay on Why Open Digital Standards Matter in Government):

  ==Conclusions: what have we learned?==

  - The only way to guarantee that our data remain ours is to store them in file formats which are independent from any single software product
  - In and by itself, Free/Open Source software is not a solution: many files in the examples above are lost not because of software licenses, but simply because:
   - Programmers didn't bother to leave any format documentation
   - End users didn't bother to demand it
  - Only formats which are not only "Free as in Freedom" but also fully documented and officially maintained by a reliable, not-for-profit organization give real guarantees

The markup of this outline is the txt2tags format: lines which start and end with one or more “=” characters are headings. A dash as first character of a line indicates a list item, or a sub-list one if preceded by a white space. Much faster than working with the mouse, isn’t it? Personally I use txt2tags because it only consists of one very simple Python script which can convert outlines to many formats, from HTML to Pdf (via LaTeX) and MediaWiki. This said, it’s quite easy to convert the scripts which follow to recognize other markup systems.

The initial ODF template is the simplest possible one: only one type of slide, that only contains text in two levels of bullet points, on a bare background. The reason is to present the basic, very general trick, with one simple but complete example. Once you understand the basic concept, however, expanding it is pretty simple, even if you want to include images, and you can use whatever template you like.

ODF slideshow generator: preparing the template

Let’s now see the preparation work you need to do (but only once) and the actual scripts that automatically convert plain text to projector-ready slideshow. To download all the templates and scripts mentioned in this page, click here.

The first thing to do is to create with OpenOffice impress a single slide presentation with your sample layout, and save it in ODF format. Next, you have to unzip the resulting .odp file, modify with any text editor its content.xml file as described below and then zip everything again with the name template.zip.

You need to mess with the content.xml file for two reasons. The first is to copy into separate files the XML code corresponding to its bullet and sub-bullet and slide sections, recognizable from the tags shown in this picture.

The second is to open those files to replace slide number, title and the XML code you remove with special text strings, like MY_SLIDES_GO_HERE, which the scripts can recognize and replace with your content. If this looks boring, it is, but remember that it’s a one-time-only work.

ODF slideshow generator: here are the scripts

There are two scripts that you need to use for generating ODF slideshows. The first is a Bash one which manages all the files involved in the process, and then calls a Perl one that actually creates the new content of the slideshow by reading the text outline. If you need to use a different template you only need to modify that second script.

The Bash script, called odp_gen.sh and shown below, takes four arguments: the text outline, two XML templates (one for the single page, one for the whole content) and the zipped version of the reference OpenDocument slideshow:

  odp_gen.sh

   1  #! /bin/bash
   2  #syntax: odp_gen.sh outline slide slideshow template
   3
   4  ODP_NAME=`date '+%Y%m%d%H%M'`
   5  ODP_SCRIPT='/usr/local/bin/odp_gen.pl'
   6
   7  mkdir tmp_odp_gen
   8  cp $1 tmp_odp_gen/outline.txt
   9  cp $2 tmp_odp_gen/slide.xml
   10 cp $3 tmp_odp_gen/slideshow.xml
   11 cp $4 tmp_odp_gen/template.zip
   12 cd tmp_odp_gen
   13
   14 unzip template.zip >& /dev/null
   15 rm    content.xml
   16
   17 $ODP_SCRIPT outline.txt slide.xml slideshow.xml > content.xml
   18
   19 rm outline.txt slide.xml slideshow.xml template.zip
   20 find . -type f -print0 | xargs -0 zip ../$ODP_NAME > /dev/null
   21
   22 cd ..
   23 rm -rf tmp_odp_gen
   24 mv $ODP_NAME.zip $ODP_NAME.odp

It first creates a temporary folder (line 7) and then copies into it all the files received as arguments (lines 8/12). After expanding the zip archive and removing the original content.xml files, it runs odp_gen.pl to create a new one with the text taken from the outline (lines 14/17). Once we have that file, it’s just a matter of removing all the temporary files, zipping together whatever is left and rename it with the .odp extension (lines 19/24). Important: for a cleaner way to zip/unzip ODF files see the comments here.

Let’s now look inside the script which actually creates the new slideshow, odp_extractor.pl:

  odp_extractor.pl

     1  #! /usr/bin/perl
     2
     3  use strict;
     4
     5  my $XML_SLIDE_TEMPLATE;
     6  my $XML_SLIDESHOW_TEMPLATE;
     7  my $CURRENT_SLIDE_NUMBER = 0;
     8  my $SLIDE_TEXT = '';
     9  my $SLIDESHOW_TEXT = '';
    10  my $SLIDE_TITLE = '';
    11  ########################################################################
    12
    13  my $ODP_BULLET_POINT= <<"END_ODP_BULLET_POINT";
    14  <text:list text:style-name="L2"><text:list-item><text:p text:style-name="P3"><text:span text:style-name="T1">__BULLET_TEXT_HERE__</text:span></text:p></text:list-item></text:list>
    15  END_ODP_BULLET_POINT
    16
    17  my $ODP_SUB_BULLET_POINT= <<"END_ODP_SUB_BULLET_POINT";
    18  <text:list text:style-name="L2"><text:list-item><text:list><text:list-item><text:p text:style-name="P4"><text:span text:style-name="T1">__SUB_BULLET_TEXT_HERE__</text:span></text:p></text:list-item></text:list></text:list-item></text:list>
    19  END_ODP_SUB_BULLET_POINT
    20
    21  ######################################################################
    22
    23  open(XML_SLIDE, "< $ARGV[1]") || die "could not open page template $ARGV[0]n";
    24
    25  while (<XML_SLIDE>) {
    26      $XML_SLIDE_TEMPLATE .= $_;
    27  }
    28
    29  close XML_SLIDE;
    30
    31  open(TEXT_OUTLINE, "< $ARGV[0]") || die "could not open text outline $ARGV[1]n";
    32
    33  while (<TEXT_OUTLINE>) {
    34      chomp;
    35      if ($_ =~ m/^==(.*)==$/) {          # a new slide starts
    36      if ($CURRENT_SLIDE_NUMBER > 0) { #format the previous page
    37          my $CURRENT_SLIDE = $XML_SLIDE_TEMPLATE;
    38          $CURRENT_SLIDE =~ s/__SLIDE_NUMBER__/$CURRENT_SLIDE_NUMBER/g;
    39          $CURRENT_SLIDE =~ s/__SLIDE_TITLE_GOES_HERE__/$SLIDE_TITLE/g;
    40          $CURRENT_SLIDE =~ s/__SLIDE_TEXT_GOES_HERE__/$SLIDE_TEXT/;
    41          $SLIDESHOW_TEXT .= $CURRENT_SLIDE;
    42          $CURRENT_SLIDE = '';
    43          $SLIDE_TEXT = '';
    44      }
    45      $SLIDE_TITLE = $1;
    46      print STDERR "$CURRENT_SLIDE_NUMBER TITLE: $SLIDE_TITLE;n";
    47      $CURRENT_SLIDE_NUMBER++;
    48      }
    49
    50      if ($_ =~ m/^- (.*)$/) { # bullet point
    51      my $CURRENT_BULLET_TEXT = $1;
    52      my $CURRENT_BULLET_POINT = $ODP_BULLET_POINT;
    53      $CURRENT_BULLET_POINT =~ s/__BULLET_TEXT_HERE__/$CURRENT_BULLET_TEXT/;
    54      $SLIDE_TEXT .= $CURRENT_BULLET_POINT;
    55      }
    56
    57      if ($_ =~ m/^ - (.*)$/) { # sub-bullet point
    58      my $CURRENT_SUB_BULLET_TEXT = $1;
    59      my $CURRENT_SUB_BULLET_POINT = $ODP_SUB_BULLET_POINT;
    60      $CURRENT_SUB_BULLET_POINT =~ s/__SUB_BULLET_TEXT_HERE__/$CURRENT_SUB_BULLET_TEXT/;
    61      $SLIDE_TEXT .= $CURRENT_SUB_BULLET_POINT;
    62      }
    63  }
    64  close TEXT_OUTLINE;
    65
    66  my $LAST_SLIDE   = $XML_SLIDE_TEMPLATE;
    67  $LAST_SLIDE      =~ s/__SLIDE_NUMBER__/$CURRENT_SLIDE_NUMBER/g;
    68  $LAST_SLIDE      =~ s/__SLIDE_TITLE_GOES_HERE__/$SLIDE_TITLE/g;
    69  $LAST_SLIDE      =~ s/__SLIDE_TEXT_GOES_HERE__/$SLIDE_TEXT/;
    70  $SLIDESHOW_TEXT .= $LAST_SLIDE;
    71
    72  undef $/;
    73  open(XML_TEMPLATE_FILE, "< $ARGV[2]") || die "could not open content XML template $ARGV[1]n";
    74  my $XML_TEMPLATE = <XML_TEMPLATE_FILE>;
    75  close XML_TEMPLATE_FILE;
    76
    77  $XML_TEMPLATE =~ s/__MY_SLIDES_GO_HERE__/$SLIDESHOW_TEXT/;
    78  print $XML_TEMPLATE;
    79  exit;

The first ten lines of ods_gen.pl set up some auxiliary variables. Lines 13 and 17 are the hardest part, at least if you want to customize the script. $ODP_BULLET_POINT is the snippet of XML code which defines one single, first-level bullet point in a slideshow with the base layout shown above. Similarly, line 17 defines a sub-bullet: the way you distinguish one from the other is through the style-name attribute (P3 or P4 in this example). The script loads from external files (lines 23-29 and 72-75) two other XML templates, slide_template.xml for single slides and slideshow_template.xml for the whole document. The middle part, that is lines 33 to 70, is the one which loads the text outline, one line at a time, recognizes the txt2Tags markup and creates the equivalent XML/ODF version.

To understand how it works it’s probably better to start from the end, that is lines 57-62. Line 57 is a Perl regular expression which means “if the current line starts with a space, a dash and then another space, save all the following text into the Perl built-in variable $1″. That variable is then copied to $CURRENT_SUB_BULLET_TEXT. Immediately after, the script copies the XML code for generic sub bullets into $CURRENT_SUB_BULLET_POINT, and replaces the placeholder string inside it (SUB_BULLET_TEXT_HERE) with the content of $CURRENT_SUB_BULLET_TEXT. Finally, this shiny sub-bullet is added to $SLIDE_TEXT. Lines 50 to 55 do the same thing with first-level bullets.

The block from line 35 to 48 is a bit more complex because it must do two things. First, like the others, it recognizes the markup for a slide title and saves it into another auxiliary variable. A slide title, however, means that (unless we are at the very beginning, hence the check at line 36) we have a full slide worth of XML, accumulated while parsing the previous line, into $SLIDE_TEXT.

Therefore, before continuing, we have to load the single slide template into $CURRENT_SLIDE and replace the three placeholder strings with, respectively, slide number, slide title and slide content. Once this has been done, we can dump the result into $SLIDESHOW_TEXT and continue. Lines 66 to 70 do the very same thing to add the content of the last slide.

Once the outline has all been converted to XML format and saved into $SLIDESHOW_TEXT, we’re practically done. All is left is to place the content of that variable in place of the MY_SLIDES_GO_HERE string inside the complete template (line 77) and print everything to standard output.

And if something isn’t clear…

Try the scripts, and you’ll see that the whole process is simpler than it looks from this explanation, and don’t hesitate to let me know if something isn’t clear!

(the content of this page was originally part of a larger article written for Linux Format)

How to generate and update ODF spreadsheets without OpenOffice

Sooner or later, many of us need to process some numeric data in plain text format, be they system logs or sales totals, and to generate reports and charts out of those data. Scripts and utilities like gnuplot could be very useful in such cases, except when the results needs to be a normal spreadsheets with charts and formulas, which is both editable and compatible with people who only know how to deal with spreadsheets in office suites.

The standard solution in such cases is to import all the raw data in programs like Calc or Gnumeric, enter by hand all the related formulas, generate charts, format everything to taste and email the result to whoever needs it. This is perfectly acceptable if it’s a one-time task, but what if your boss demands a new version of that spreadsheet, with fresh data, every day?

Luckily, the flexibility and openness of ODF make tasks like this very easy. There is no reason to suffer and waste time with such chores only because somebody wants a “real”, editable spreadsheet with plenty of nice charts. If your raw data and the layout of the required spreadsheet have a constant structure, it only takes a bit of shell and Perl scripting to generate automatically a new ODF spreadsheet with the same layout whenever the raw data change.

The general procedure, shown in the diagram, is explained in Why and how the OpenDocument format can save you a lot of time! and works even with data that aren’t stored in some database, or when you must change text, not just numbers

What’s inside an OpenDocument spreadsheet

An ODF spreadsheet is just a compressed zip archive. The actual data are in a file called content.xml. Each sheet is a two dimensional table (whose cells contain numbers, strings or formulas) inside that file.
Each row in those tables is an element called "<table:table-row>". Single cells are marked as "<table:table-cell>". The actual value of a number is stored in two cell attributes called "office:value-type" and "office:value"; formulas, instead, are inside “table:formula” attributes. Cell ranges are defined with square brackets, like "[.B17:.D19]".

Objects live in a folder called… Objects, which contains sub-folders called “Object 1″, “Object 2″ and so on. Each chart of an ODF spreadsheet is stored as a "<draw:object>" element written into the content.xml file of its own “Object n” sub-folder. More exactly, inside this second file a chart has some attributes specifying its appearance plus one XML element called "<table:table>" which contains a copy, in a slightly different format, of all the cells related to the chart. It is this copy which is used by ODF applications to do the actual drawing.

Practical example: average bandwidth consumption

(note: all the scripts and .ods files mentioned below are here).

Let’s assume that your raw data are the bandwidth consumption of two servers, calculated every hour over the last 24 hours. They may come to you from a script, a database or via email. This doesn’t matter, as long as the format is constant. In this example we have ASCII files made of 24 lines text divided in three columns separated by tabs: a time-of-day label plus server 1 and server 2 bandwidths in MBit/sec:


Time-of-day     BW 1   BW2
  Midnight        4.5    6.4
                  6.3    6.3
                  3.1    6.1
                  1.85   5.87
                  etc...


What the boss, instead, wants first thing every morning, is a nice spreadsheet like this, which also calculates the average bandwidth consumption in each hour and plots it.

The first thing to do is to unzip the .ods file. Inside it, we’ll find all those numbers in two distinct places, as explained above. Using the data of Listing 1, the XML code (inside the content.xml file) for row 17 of the spreadsheet shown in the picture would look like this (note the formula at the end):

  Listing 2:
  <table:table-row table:style-name="ro1"><table:table-cell office:value-type="string"><text:p>Midnight</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="4.5"><text:p>4.5</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="6.4"><text:p>6.4</text:p></table:table-cell><table:table-cell table:formula="oooc:=SUM(B17:C17)/2"/></table:table-row>

This, instead, would be the XML source for the first three points of the three lines in the chart and their label, inside “Object 1/content.xml”:

  Listing 3:
  <table:table-row><table:table-cell office:value-type="string"><text:p>Midnight</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="4.5"><text:p>4.5</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="6.4"><text:p>6.4</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="5.45"><text:p>5.45</text:p></table:table-cell></table:table-row>

Yes, it’s terribly verbose but it’s plain text, isn’t it? Therefore, obtaining a spreadsheet with the same layout but updated charts and formula results from different raw data is just a matter of replacing all the strings of the XML rows above with other strings which contain the new raw numbers. In order to do this, we need to transform those XML files into a spredsheet template. To prepare it, execute these commands to unpack the spreadsheet:

  Listing 4:
  #> mkdir temp
  #> cp sample_spreadsheet.ods temp/sample.zip
  #> cd temp
  #> unzip sample.zip
  #> ls -l
  content.xml
  Object 1/content.xml
  ObjectReplacements/Object 1
  ...other files omitted for brevity
  #> rm "ObjectReplacements/Object 1"

The file “ObjectReplacements/Object 1″ is a binary version of the chart, created when the file was last saved. It is also what OpenOffice would display by default if you reopened the file, until you forced the application to redraw the chart by changing the value of some cell. It’s necessary to remove it from the template, otherwise, no matter what data are in the new spreadsheet, you’d see the original drawing until you updated some cell.

In general, the trasformation of the two content.xml files in templates is simple, no matter how complex the table or the chart are. Open them in any text editor, locate all the table rows (those of Listing 2 and 3) and replace them with one placeholder string. After that, the main file should look like Listing 5 and “Object 1/content.xml” should become similar to Listing 6:

  Listing 5:
  ...lots of XML elements...
  </table:table-row>MY_DATA_GO_HERE</table:table>
  ...lots of other XML elements...

  Listing 6
  ...lots of XML elements...
  <table:table-rows>MY_CHART_GOES_HERE</table:table-rows>
  ...lots of other XML elements...

Just be very careful to cancel only the row elements and nothing else, otherwise you’ll corrupt the template. Once you’re finished, save everything in a tar file called ods_bw_template.tar. Remember, this isn’t a valid ODF file anymore, just a template, that’s why you shouldn’t use the .ods extension.

The actual scripts

The generation of other spreadsheets with new data happens by means of the scripts in Listing 8, ods_gen.pl. Instead of running it directly though, we use it inside the shell wrapper called ods_gen.sh shown in Listing 7, which I’ll explain first. ods_gen.sh takes two parameters:

# ods_gen.sh raw_data_1.txt ods_bw_template.tar

which are a raw data file with the format of Listing 1 and, respectively, the template. After copying them in the temporary folder tmp_ods_gen and expanding the tar archive (lines 6 to 11), ods_gen.sh calls ods_gen.pl two times. The first (line 13) creates a new main content.xml file (hence the “main” argument”). Then the same script, called with the “chart” option in line 16, updates the other content.xml file, the one used to draw the chart. The final part, from line 19 to the end, is just housekeeping: remove all temporary files, zip everything else and save it as $ODS_NAME.ods

  Listing 7: ods_gen.sh
       1  #! /bin/bash
       2
       3  ODS_NAME=`date '+%Y%m%d%H%M'`
       4  ODS_SCRIPT='/home/marco/bin/ods_gen.pl'
       5
       6  echo Loading $1 into $ODS_NAME.ods with template $2
       7  mkdir tmp_ods_gen
       8  cp $1 tmp_ods_gen/data.txt
       9  cp $2 tmp_ods_gen/template.tar
      10  cd tmp_ods_gen
      11  tar xf template.tar
      12
      13  $ODS_SCRIPT data.txt content.xml main > tmp_content_main.xml
      14  mv tmp_content_main.xml content.xml
      15
      16  $ODS_SCRIPT data.txt 'Object 1/content.xml' chart > tmp_content_chart.xml
      17  mv tmp_content_chart.xml 'Object 1/content.xml'
      18
      19  rm template.tar data.txt
      20
      21  find . -type f -print0 | xargs -0 zip ../$ODS_NAME > /dev/null
      22
      23  cd ..
      24
      25  rm -rf tmp_ods_gen
      26
      27  mv $ODS_NAME.zip $ODS_NAME.ods

As I anticipated, the real work happens inside the ods_gen.pl script of Listing 8:

  Listing 8: ods_gen.pl

       1  #! /usr/bin/perl
       2
       3  use strict;
       4  my $TABLE_DATA = '';
       5  my $CURRENT_ROW = '';
       6  my $CURRENT_FORMULA = '';
       7
       8  ########################################################################
       9  #
      10 # Spreadsheet-dependant variables
      11
      12 my $ODS_MAIN_ROW_TEMPLATE= <<"END_MAIN_ROW_TEMPLATE";
      13 <table:table-row table:style-name="ro1"><table:table-cell office:value-type="string"><text:p>MY_LABEL_STRING</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="MY_FIRST_NUMBER"><text:p>MY_FIRST_NUMBER</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="MY_SECOND_NUMBER"><text:p>MY_SECOND_NUMBER</text:p></table:table-cell><table:table-cell table:formula="oooc:=MY_ODS_FORMULA"/></table:table-row>
      14  END_MAIN_ROW_TEMPLATE
      15
      16  my $ODS_CHART_ROW_TEMPLATE= <<"END_CHART_ROW_TEMPLATE";
      17  <table:table-row><table:table-cell office:value-type="string"><text:p>MY_LABEL_STRING</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="MY_FIRST_NUMBER"><text:p>MY_FIRST_NUMBER</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="MY_SECOND_NUMBER"><text:p>MY_SECOND_NUMBER</text:p></table:table-cell><table:table-cell office:value-type="float" office:value="MY_ODS_FORMULA"><text:p>MY_ODS_FORMULA</text:p></table:table-cell></table:table-row>
      18  END_CHART_ROW_TEMPLATE
      19
      20  my $ODS_FORMULA    = 'SUM([.RANGE_START:.RANGE_END])/2';
      21  my $START_COLUMN   = 'B';
      22  my $END_COLUMN     = 'C';
      23  my $CURRENT_ROW_ID = 17;
      24
      25  ######################################################################
      26
      27  die "No correct operation specified!n" if (($ARGV[2] ne 'main') &&
      28    ($ARGV[2] ne 'chart'));
      29
      30  open(DATA, "< $ARGV[0]") || die "could not open data file $ARGV[0]n";
      31
      32  while (<DATA>) {
      33    chomp;
      34    my ($STRING, $NUM_1, $NUM_2) = split /t/;
      35
      36    if ($ARGV[2] eq 'main') {
      37       $CURRENT_ROW     = $ODS_MAIN_ROW_TEMPLATE;
      38       $CURRENT_FORMULA = $ODS_FORMULA;
      39       $CURRENT_FORMULA =~ s/RANGE_START/$START_COLUMN$CURRENT_ROW_ID/;
      40       $CURRENT_FORMULA =~ s/RANGE_END/$END_COLUMN$CURRENT_ROW_ID/;
      41       $CURRENT_ROW_ID++;
      42    }
      43    if ($ARGV[2] eq 'chart') {
      44       $CURRENT_ROW = $ODS_CHART_ROW_TEMPLATE;
      45       # Change next line to match your own $ODS_FORMULA!!
      46       $CURRENT_FORMULA = sprintf("%10.2f", ($NUM_1 + $NUM_2)/2);
      47       $CURRENT_FORMULA =~ s/s*//g;
      48    }
      49    chomp $CURRENT_ROW;
      50    $CURRENT_ROW =~ s/MY_LABEL_STRING/$STRING/g;
      51    $CURRENT_ROW =~ s/MY_FIRST_NUMBER/$NUM_1/g;
      52    $CURRENT_ROW =~ s/MY_SECOND_NUMBER/$NUM_2/g;
      53    $CURRENT_ROW =~ s/MY_ODS_FORMULA/$CURRENT_FORMULA/g;
      54    $TABLE_DATA .= $CURRENT_ROW;
      55  }
      56  close DATA;
      57
      58  undef $/;
      59  open(XML_TEMPLATE_FILE, "< $ARGV[1]") || die "could not open content XML template $ARGV[1]n";
      60  my $XML_TEMPLATE = <XML_TEMPLATE_FILE>;
      61  close XML_TEMPLATE_FILE;
      62
      63  $XML_TEMPLATE =~ s/MY_DATA_GO_HERE/$TABLE_DATA/ if ($ARGV[2] eq 'main');
      64  $XML_TEMPLATE =~ s/MY_CHART_GOES_HERE/$TABLE_DATA/ if ($ARGV[2] eq 'chart');
      65  print $XML_TEMPLATE;
      66  exit;

The important part starts at line 12, after the initialization of some auxiliary variables: set them according to your needs. $ODS_MAIN_ROW_TEMPLATE and $ODS_CHART_ROW_TEMPLATE are the parts from content.xml and, respectively, “Object 1/content.xml” already shown in Listings 2 and 3. The only change to do is to replace numbers and formulas with the strings MY_LABEL_STRING, MY_FIRST_NUMBER, MY_SECOND_NUMBER and MY_ODS_FORMULA in the proper places. If you want to modify this script to generate spreadsheets with a different layout, this is where you must start: create your template, open it as explained above and copy a complete table row from each file in these two variables.

Lines 20 and 46 are equally important: make sure they are consistant with each other because that’s where you define your formula, first in ODF and then in Perl format. The value in line 20 contains two placeholders, RANGE_START and RANGE_END, because the actual cell addresses are substituted later on, in lines 39 and 40, using the starting values defined in lines 21 to 23.

In lines 30 to 34 the script reads the raw data file one row at a time, loading the values from the three columns in $STRING, $NUM_1 and $NUM_2.

When ods_gen.pl creates the new XML code, numbers can just go in as they are, but formulas need a little extra processing. If we are generating the main content.xml file, then we have to rebuild the original formulas with their absolute cell addresses, which is just what happens in lines 38 to 40. If we are creating the chart XML file instead, we have to calculate the numeric result of the formula (lines 46 and 47).

In both cases the variable $CURRENT_ROW is pre-loaded in lines 37 or 44 with the template variable corresponding to the kind of file we need to generate (‘main’ or ‘chart’). Finally, in lines 50 to 54 we perform all the substitution and add the row to $TABLE_DATA.

After closing the data file, we dump inside the $XML_TEMPLATE variable all the content of the XML file passed as second parameter, put the $TABLE_DATA in place of the MY_DATA_GO_HERE or MY_CHART_GOES_HERE variables and write everything to standard output.

That’s it, really. Once the scripts are ready, you can automatically generate as many versions of the initial reference spreadsheets as you like by just running the scripts with a different data set.

The results, as shown in the last two pictures, will be each time a different but 100% valid ODF spreadsheet with different data and charts (left picture) and the same editable formulas.

Pros and cons of this method

  • Pros
    • all the formatting and layout work happens just once, quickly, in a graphical interface and only if you do have to create the reference spreadsheet from scratch.
    • the final result is editable or embeddable in other spreadsheet without programming
    • The two scripts explained here are less than 100 lines of code together and only need bash and Perl: they will work practically everywhere, even if OpenOffice isn’t available and/or you can’t install XML libraries or other utilities.
  • Cons
    • less scalable, elegant and flexible that doing the same things with real ODF/XML tools (cfr the first article of this series) or LaTeX
    • (in extreme cases) accuracy.Lines 20 of Listing 3 calculates the numbers that OpenOffice displays in the cells of column D. Line 46 of the same listing substitutes MY_ODS_FORMULA inside “Object 1/content.xml” with the numbers that OpenOffice uses as Y-axis values when drawing the yellow line of the chart. Theoretically, when dealing with complex floating point math, some numbers in the first set may not be mathematically equal to their counterparts in the second set. The difference, if any, is surely negligible in the example here and most real world scenarios, but it doesn’t hurt to be aware of the issue.

(the content of this page is based on an article originally written for Linux Format)

Comments to, How to automatically create ODF invoices without OpenOffice

(Note: these are the comments appended to my original article, which I had to put in a separate page when I switched from Drupal to WordPress)

Just came to your site…

Just came to your site following a link from linuxtoday. Wow! This opens windows of opportunities! Somehow I’ve totally missed out on the fact that odt documents are just zip files. I’ve been reading a bit through some content.xml files. And it seems that it should be possible to use openoffice from a text editor just as fine.

I made my thesis in OpenOffice, but only after convincing myself that it wasn’t worth it to invest a lot of time in learning (La)tex. There are always these small things that just change when working on a big file. With my current understanding, it would make perfect sense to write a document in a text editor, and then put it into some file of which I like the lay-out. Use OpenOffice as a mark-up language. And that is just frigging awesome!

I haven’t started to really research or test the opportunities, I just want to express my enthusiasm. Maybe you could shed a light on how to use openoffice as a true mark-up language?

Thanks a lot!

Joep

On ODF as a markup language

My answer:

Joep,

first of all, thanks a lot for your appreciation of my work. There is no doubt that you can create OpenDocument (not openoffice!) files only using a plain text editor like notepad, emacs, Vim and so on. However, I would avoid doing so because it is very boring, time consuming and error-prone. One of the articles I have in the pipeline is how to write text in a much simpler and faster markup language liketxt2tags and then generate the OpenDocument version of the text with a script.

Another way to improve the zip/unzip part of this script…

….has been kindly suggested by Sander at lxer.com (thanks, Sander!):

The script will lead to an ODF file with an invalid mimetype. Creating an ODF file using the zip command on the commandline is only slightly more complicated. See this article on my blog

Useful improvement

Soren Roug just suggested an improvement through the ODF-discuss list:

It's somewhat of a brute force solution. The main issue is that the mimetype
has to be the first element in the zip-file and not be compressed in order to
comply fully with the specification

Since the my_template.odt file already complies, it is much easier to take out
only content.xml and then stuff it back in.



cp my_template.odt new_$FILENAME.odt
unzip new_$FILENAME.odt content.xml
sed ... content.xml >custom_content.xml
mv custom_content.xml content.xml
zip -f new_$FILENAME.odt content.xml

Thanks, Soren!

How to automatically create OpenDocument invoices without OpenOffice

Articles on how to create OpenDocument invoices already exist but almost always they require you to start and use OpenOffice manually each time. Here, instead, I’ll show how to have your computer to do all your OpenDocument work for you.

odf_scripting_sample_invoice The script explained below takes an ODF template like the one of the left and generates an ODF text file like the one you see on the right below, which is ready to be printed or sent via email (follow the links to download the template or the resulting ODF invoice).

odf_scripting_customized_invoice The advantages of creating the invoice with a script rather than with macros are explained in detail in the “Why and how ODF can save you a lot of time” page: in a nutshell, once you have created and saved with any ODF compliant word processor the initial template, the whole process is completely automatic, so it could run unattended and be integrated with other backend systems even where OpenOffice isn’t installed.

The script takes two arguments, the template name and an invoice data file:

odf_invoice_generator.sh odf_scripting_sample_invoice.odt my_data.sh

then opens the template, replaces placeholder data or strings like e.g. __Customer_name with the proper values from the invoice data file and finally saves everything as a separate OpenDocument text file. The data file has an extremely simple format, since it’s only variable assignments in shell script syntax:

  marco => cat my_data.sh
  INVOICE_DATE='2010/05/15'
  VENDOR_CODE='007'
  PO_NUMBER='Purchase Order #1'
  TOTAL=10
  ISSUE=150
  DESCRIPTION='Here is your invoice'

and can be automatically generated on the spot by querying a database, by a Web server or in many other ways.

Here is the complete script, followed by an explanation:

       0 #!/bin/bash
       1  WORK_DIR=odt_invoice_generator_temp_dir
       2
       3  rm -rf $WORK_DIR
       4  mkdir  $WORK_DIR
       5  FILENAME=`basename $1 .odt`
       6
       7  cp     $1 $WORK_DIR/my_template.odt
       8  cp     $2 $WORK_DIR/my_data.sh
       9
      10  ## preparation
      11  cd     $WORK_DIR
      12  mkdir  work
      13  mv     my_template.odt work
      14  cd     work
      15  source ../my_data.sh
      16  unzip  my_template.odt > /dev/null
      17  rm     my_template.odt
      18
      19  ## replace text strings
      20  sed "s|__INVOICE_DATE|$INVOICE_DATE|"  content.xml  
      21  | sed "s|__VENDOR_CODE|$VENDOR_CODE|"               
      22  | sed "s|__PO_NUMBER|$PO_NUMBER|"                   
      23  | sed "s|__TOTAL|$TOTAL|g"                          
      24  | sed "s|__ISSUE_NUMBER|$ISSUE|"                    
      25  | sed "s|__DESCRIPTION|$DESCRIPTION|"               
      26  | sed "s|__Customer_name|$Customer_name|"           
      27  > custom_content.xml
      28  mv custom_content.xml content.xml
      29
      30  ## zip everything, rename it as .odt file and clean up
      31  find . -type f -print0 | xargs -0 zip ../$FILENAME > /dev/null
      32  cd ..
      33  mv $FILENAME.zip ../new_$FILENAME.odt
      34  cd ..
      35  rm -rf $WORK_DIR

The lines from 1 to 17 don’t do anything difficult: create a temporary working directory (WORK_DIR) copy the template inside it, unzip the template, and load from the data file (line 15) the values that must fill the template.

The (relatively) tricky part are lines 19 to 26: this is where a series of sed commands replaces each placeholder string in the content.xml file which contains the template text with its value loaded from my_data.sh.
If you change the template you must add here one sed command for each string you want to substitute; the order is not important. Just remember that if a string occurs multiple times, as is the case with the __TOTAL price, you must add the g (global) option to sed (cfr line 23), otherwise the script will only replace the first occurrence of that string.

If you have any question or suggestion about this script, please email me or (even better, add it in the comments.

Why and how the OpenDocument format can save you a lot of time!

The OpenDocument Format (ODF) is an internationally recognized open standard for digital office documents whose importance has also been acknowledged by Microsoft. ODF is good for a lot of reasons I have already explained in Everybody’s Guide to OpenDocument. However, there is also one more reason why ODF is great for everybody who must produce a lot of office documents, one that will be the subjects of many posts on this website: ODF is really simple to generate or edit automatically. Even if you aren’t a professional programmer, it takes very little effort to put together a script that generates or processes in any way texts, presentations or spreadshets in ODF format.

How the openness of ODF makes automatic generation of documents much simpler

Very often, we use computers to produce many different versions, every time with new data, of some reference text, presentation or spreadsheet. Changing those kind of files manually makes sense only if it happens once in a while. When it’s a regular activity, instead, it can become a huge waste of time. ODF, however, makes it very quick and easy to insert raw data into texts, spreadsheets or presentations with the slightest possible amount of manual work and without even running OpenOffice. This is possible because an ODF file is just a ZIP archive, with pictures and macros in their own folders and the actual text written, in XML format, inside a file called content.xml. Therefore, in order to create a new, 100% compliant ODF file with different data, tables or images, you only have to open the archive, process the text inside content.xml or put new pictures in their folder if necessary and zip everything again. You must only use OpenOffice once, to create a template by hand if you if you don’t find a suitable one online.

The power of script-based ODF processing

You could perform repetitive generation and editing of ODF office files even manually, with a text editor like Notepad, Emacs or VI. The real power of ODF, of course, is in the fact that you can (and should) do all that processing automatically, with very simple shell or Perl scripts, that is with tools that are included in any Gnu/Linux distribution but can also work on Windows and Mac. The main advantages of this approach to office document processing are:

  • it works even without Openoffice, so it could even run on a server
  • there is no need of any relational database but you can use one if necessary
  • learning to do these things with shell scripts instead of OpenOffice macros:
    • gives you skills that you can reuse in a lot of other contexts.
    • allows very easy integration with other command line tools, from cron jobs to mass mailing, chart generation with Gnuplot or scaling, watermarking, framing of all images in a document with ImageMagick
  • above all, it’s much simpler (and faster) than you’d think!

The last point is the most important. Using the method explained here everybody with just a basic grasp of shell scripting can generate, modify or analyze hundreds of ODF text documents with just a few minutes of easy coding.

Of course, this approach is not really flexible, scalable or really robust, unless you add lots of code for error management, but the idea here is not to develop industrial strength solutions. If that’s what you need, you’ll have to either use real XML based tools like Odfpy or go straight to the source, the book OpenDocument Essentials by J. David Eisenberg, that you can also purchase at Lulu.com.

This said, there are tons of cases where heavyweight tools like those aren’t worth studying, installing and deploying, but people still end up wasting many hours on repetitive edits. Learning how to write quick and dirty shell scripts that can open and update an ODF file is an easy but huge time saver in such situations.

What’s next?

Here are some of the ODF scripting recipes that you’ll find on this website in the next days (but if there are other recipes that you would like to see published, just ask and if possible I’ll write them!):

(note: some of these posts are updated excerpts of articles originally written for Linux Format, and are republished here with their permission)