How to automatically print or convert to PDF, MS Office or other formats OpenDocument files

The script and tricks in the ODF scripting section of this website show how to create office-ready texts, presentations and spreadsheets automatically, in the OpenDocument format, which is a worldwide standards. This is all many people need to work today. Sometimes, however, it’s still necessary to either print those documents, or exchange them to somebody in other formats, like PDF or those of the older releases of Microsoft Office (newer releases of this program are already partially compatible with OpenDocument through free plugins, so if your partners have those versions they should really use those plugins, instead of bothering you with requests for drug-like, legacy file formats, but that’s another story).

Of course, if you only need to print or convert to other formats only once in a while there’s no reason to not do it from OpenOffice. The simple tricks explained below, however, are a life-saver when you need to do this many times, and of course you’d like your computer to do it for you while you have a coffee or something.

On Linux systems it is easy to do all this, and even send the converted files via email, automatically. Let’s assume that you have an OpenDocument text, spreadsheet or presentation already sitting in some folder, waiting to be processed.

Both printing and conversion to PDF, HTML or MS Office formats from the command line need OpenOffice to work. In the second case, the reason is that what makes the actual work is one of the OpenOffice macros linked below: when you launch OpenOffice, it executes that macro on the file indicated by the user and then exits. Macros are not needed for printing because OpenOffice has dedicated options for that. Usage of OpenOffice from the command line is explained on the OOo wiki. In a nutshell, this is the correct syntax:


  soffice -invisible macro://path-to-macro($FILE)


On some systems, you may need to provide the complete path to the soffice program. The -invisible option is what makes OpenOffice start without a graphical interface. The file to process must be passed as argument ($FILE) to the macro.

The command above is all you need if you are working on a complete Gnu/Linux desktop, that is a system that also has a graphical interface server (called X server). For the record, you can do the same thing in Windows with a batch script like this (taken from an OOoforum thread):


  @ECHO OFF

  "c:program filesOpenOffice.org1.1.4programsoffice" -invisible "macro:///Standard.Module1.ConvertToPDF(%1)"


When you want to work inside a Linux Web or print server, instead, that is on a computer where X was never installed, you need to set up some extra variables before launching OpenOffice, otherwise it won’t start. This is how to do it (the explanation for the extra commands are in the thread in which I found them, which also includes instructions on how to install OpenOffice on a (remote) server:


  export PATH=$PATH:/usr/bin/X11
  export LANG=en_US
  export HOME=/var/www
  xvfb-run -a /usr/bin/soffice -invisible macro://path-to-macro($FILE)


Please note the extra piece in the actual command, that is in the last line above: `xvfb-run -a`. Xvfb is a smaller X server used in special situations like this, when a full X wouldn’t be installable. Also, don’t forget that, depending on the server configuration and your actual needs, you’ll probably have to change the LANG and HOME variables.

Show me the macros!

The previous paragraph explains how to run OpenOffice from the command line on Linux or Windows in order to execute any macro. Let’s now look at the actual macros we need to print or save in Microsoft or other formats. There are several ones available online.

Those with the best explanation, which includes details on how to install any macro in OpenOffice, are SaveAsPDF and SaveAsDoc. The beauty of these macros is that it is very easy to modify them to save in HTML or any other format that OpenOffice can handle! You just have to substitute the right values for the file extension (MYEXTENSION) and the filter name (MY_FILTER_NAME) in this part of the macro:


   cFile = Left( cFile, Len( cFile ) - 4 ) + ".MYEXTENSION"
     cURL = ConvertToURL( cFile )

     oDoc.storeToURL( cURL, Array(_
              MakePropertyValue( "FilterName", "MY_FILTER_NAME" ),)


Another macro that saves an OpenDocument file in PDF format was posted to the Fedora mailing list. Whichever macro you choose, put it in a suitable folder, accessible from the script and user account that will use it, and replace the path-to-macro string above with the actual full path to the macro in the file system.

How to print or email OpenDocument files from the command line

In order to do this we just need two other command line options of OpenOffice (see here for the complete list or type `soffice -?` at a command prompt to get a complete listing):


  soffice -invisible -p <documents...>
  soffice -invisible -pt <printer> <documents...>


They both print all the specified documents. The only difference between them is that the first one uses the default printer, the second looks for the printer given as first parameter.

Finally, if you also want your script to email on your behalf the files that it generated in this way, you can use the text-based Mutt email client in this way ($EMAIL_TEXT is a separate text file containing the text of the message):


   mutt $RECIPIENT  -s "$UBJECT" -i $EMAIL_TEXT -a $FILE_TO_BE_ATTACHED


if you find any error in this page or have any suggestion, please tell me (but remove the numbers from the email address first!)

How to transform (almost) plain ASCII text to Lulu-ready PDF files, part 2

This page gives a general overview of a flow for transforming ASCII files in print-ready PDF books. The reasons for setting up such a flow in this way are explained in the first part of this tutorial.

Basic workflow

The basic usage of txt2tags is really simple. Once you’ve written something that you need to convert to PDF, text or HTML you can launch the graphic interface with the –gui option or run a command like this at the prompt:

  txt2tags -t xhtml -i mypost.txt -o mypost.html

This will tell txt2tags to save in the mypost.html file an HTML version of the content of mypost.txt. What tells the script the desired output format is the -t (target) option. In this case it is xhtml. Had it been txt or tex, it would have produced a plain text or LaTeX file.

This figure shows the txt2tags source of this article alongside with its plain text, HTML and PDF versions.

As you can see, syntax coloring for txt2tags is already available for Kate (the editor shown above) as well as emacs, Vim and other popular text editors. In order to obtain PDF files, you need to run pdflatex or similar tools on the .tex file created by txt2tags:

  txt2tags -t tex -i mypost.txt -o mypost.tex
  pdflatex mypost.tex

From single files to books

The real power of txt2tags, at least for me, is the fact that it makes easy to work on multiple, completely independent source files as if they were one (more on this later). This makes a breeze to create whole books and sets of HTML pages or other content related to the books, always keeping everything in sync and interlinked with the content of the other versions. Here is a real world case, that is how I created the PDF source and its HTML counterparts for the Guide.

I had some specific requirements, which by the way are common to many other projects of mine. First, I wanted each chapter of the book to be in a separate file. This is both to make incremental backups easier and to collate files from different previous projects without duplicating them. Then I wanted to create an independent, online HTML list of all the web pages mentioned in the book, with the same reference numbers used in the printed copy. I also wanted each of those links in the HTML list to have a descriptive caption that I had written by hand. This is, by the way, the reason why I worked out a custom, but relatively simple cross-referencing system, instead of doing everything in LaTeX. For example, at a certain point in the book I wrote that Windows Vista has been defined a “landfill nightmare”. This is the corresponding sentence in source file, complete with txt2tags markup, which includes the reference URL:

  the UK Green Party officially declared Vista... a ["landfill nightmare" http://www.greenparty.org.uk/news/2851]

I wanted the PDF version of the chapter to include a reference number like [19 - 2], to mean it’s the second cross-reference of chapter 19. I also wanted the HTML list to associate to that link the same number and the caption ‘Windows Vista? A “landfill nightmare”‘. Using the scripts explained below produced the HTML source for the online version of the chapter, the PDF shown in the previous figure and the HTML list with the same references number that you can see online.

The main script I used to create the PDF version ready for upload shown above is published and explained in the last part of this tutorial. The PDF resulting after processing the cross-references is shown here.

How to transform (almost) plain ASCII text to Lulu-ready PDF files, part 3

This is the core script I used to transform a set of plain ASCII files with the Txt2tags markup in one print-ready PDF file. Part 1 of this tutorial explain why I chose txt2tags as source format and Part 2 describes the complete flow.

Book creation workflow

  Listing 1: make_book.sh

    1   #! /bin/bash
    2
    3   CONFIG_DIR='/home/marco/.ebook_config'
    4   PREPROC="%!Includeconf: $CONFIG_DIR/txt2tags_preproc"
    5
    6   CURDIR=`date +%Y%m%d_%H%M_book`
    7   echo "Generating book in $CURDIR"
    8   rm -rf $CURDIR
    9   mkdir $CURDIR
   10   cp $1 $CURDIR/chapter_list
   11   cd $CURDIR
   12
   13   FILELIST=`cat chapter_list | tr "12" " " | perl -n -e "s/.//..//g; print"`
   14
   15   echo ''                                 >  source_tmp
   16   echo  $PREPROC                          >> source_tmp
   17   sed 's/.//%!Include: .//g' $FILELIST >> source_tmp
   18
   19   replace_urls_with_refs.pl source_tmp > source_with_refs
   20
   21   txt2tags -t tex -i source_with_refs -o tex_source.tex
   22   perl -pi.bak -e 's/OOPENSQUARE/[/g'   tex_source.tex
   23   perl -pi.bak -e 's/CLOOSESQUARE/]/g'  tex_source.tex
   24
   25   #remove txt2tags header and footer
   26   LINEE=`tail -n +8 tex_source.tex | wc -l`
   27   LINEE_TESTO=`expr $LINEE - 4`
   28   tail -n +8 tex_source.tex | head -n $LINEE_TESTO > stripped_source.tex
   29
   30   source custom_commands.sh
   31
   32   cat $CONFIG_DIR/header.tex stripped_source.tex $CONFIG_DIR/trailer.tex > complete_source.tex
   33   pdflatex complete_source.tex
   34   pdflatex complete_source.tex
   35
   36   # Generate URL list in HTML format
   37   generate_url_list.pl chapter_list html | txt2tags -t xhtml  -no-headers -i - -o url_list.html

All the txt2tags settings and some LaTeX templates are stored in the dedicated folder $CONFIG_DIR, so you can have a different configuration for each project. The scripts itself only takes one parameter, that is a list of all the source files that must be included in the book. Lines 6 to 12 create a work directory and copy the file list inside it. The files must be written in the file list with their absolute paths, in the order in which they must appear in the book.

Lines 13 to 17 of the script create a single source file (source_tmp) that contains the Include command loading all the txt2tags preprocessing directives (line 16) and then the content of all the individual files, in the right order but without the Include directives that are needed when processing them individually (line 17).

Line 19 runs a separate script, replace_urls_with_refs.pl, that adds the cross-reference numbers to the book text and dumps the result into another temporary file, source_with_refs. This script, not included here for brevity and because you can do without it if you don’t need cross-references like me, only does two things. First it reads a file in the $CONFIG_DIR folder that contains, one per line, all the URLs mentioned in the source files and the corresponding captions, in this format:

http://www.greenparty.org.uk/news/2851 | Windows Vista? A “landfill nightmare”

Next, replace_urls_with_refs.pl reads source_tmp and, whenever it finds a line like:

the UK Green Party officially declared Vista... a ["landfill nightmare" http://www.greenparty.org.uk/news/2851]

generates the right cross-reference number and puts it right after the text associated to the link itself, writing everything to source_with_refs. You can see the effect in the last figure of part 2 of this tutorial. After all this pre-processing, we can finally run txt2tags to produce a LaTeX file (line 21) but right after that we need to put back square brackets in place of some temporary markup generated by replace_urls_with_refs.pl (lines 22/23). The next part of the script, until line 32, remove the default LaTeX header and footer created by txt2tags, replaces them with those stored in the $CONFIG_DIR folder and dumps everything into complete_source.tex: this move allows you to declare whatever LaTeX class you wish to use (I used Memoir), or to give any other LaTeX instruction in the header, without any interference or involvement from txt2tags. figure_05_final_result1 Sometimes I use line 30, also optional, to execute any other post-processing commands on the LaTeX source that, for any reason, it is not convenient to run before. The two invocations of pdflatex in lines 33/34 finish the job: the first creates a first draft of the book to calculate page numbers and other data, the second produces the final PDF with clickable table of contents (see below) and all the other goodies the Memoir LaTeX class can handle. The script in line 37 is the one that scans again all the source files to produce the HTML list of references.

Summing all up

I haven’t described all the gory details and auxiliary scripts at lenght because my main goal with this article is to introduce a txt2tags-based way of working. As I already said, this general method is quite simple but in spite of this, or maybe just for this reason, I find it very, very flexible and powerful. Two great Free Software applications, txt2tags and pdflatex, plus about one hundred lines of codes in three separate scripts, can produce print-ready digital books and/or all the HTML code you need to make an online version or simply an associated website. Besides, you can easily add to the mix programs like curl or ftp upload everything to a server. Personally, my next step will be to extend make_book.sh to generate OpenDocument files thanks to OpenDocument scripting.

How to transform (almost) plain ASCII text to Lulu-ready PDF files, part 1

Many people write far more now that they are constantly online than in the pre-Internet age. Most of this activity is limited to Web or office-style publishing. People either write something that will only appear inside some Web browser or a traditional “document”, that is a single file, more or less nicely formatted for printing. Very often, however, they don’t do it in the most efficient way.

The most common solution for the first scenario still is to write HTML or Wiki-formatted content in a text editor or, through a browser, directly in the authoring interface of CMS systems like Drupal or WordPress. The other approach is even closer to the typewriting era, since it’s limited to using a word processor like OpenOffice. Both methods involve too much manual work for my taste, especially if you often want to reuse or move content from one format to the other.

Since I write a lot for both of the scenarios above and some more, some time ago I realized that I needed a more efficient and flexible workflow: something that was as close as possible to “write ONCE, publish anywhere, re-mixing and processing already written stuff in any possible way without getting mad along the way”. I wanted to write QUICKLY, without thinking at all of where or in which format the text would end up, while being prepared for all cases, from blog to book. I also wanted to use only Free Software that would run quickly even on old computers, with little or no configuration, if necessary on any operating system. Finally, I wanted the possibility to manage, search and process all my writings automatically, with command line utilities or shell scripts.

While I must admit I’m not there yet (especially when working on commission with very particular requirements) I already am pretty close to it in most cases. The rest of this article explains which software I chose and some scripts I wrote to work in this way, that is to write stuff only once and then convert it to a publishing-quality PDF or to HTML with just a few commands.

The first (easy) choice I had to make was “which file format should I use?”. I am a huge fan of the OpenDocument format (ODF), also because ODF is very easy to hack. However, the requirements above immediately exclude it as a source format for most of the stuff I write. The natural Free as in Freedom format for producing good PDF is still TeX or LaTeX, but I wanted HTML and OpenDocument as final options, which aren’t easy to obtain starting from TeX. Besides, I wanted to write quickly text that would be already highly readable in its native format, without too much markup in the way. The obvious conclusion was that I should write plain text marked up with a simple, wiki-like syntax as ReST, Markdown or txt2tags.

I chose the latter for two reasons. First, it has very good export to all formats I need (LaTeX, plain text, MediaWiki and HTML) with the exception of ODF which is, however, relatively easy to add, at least conceptually. Above all, however, txt2tags is simple. Its markup is very readable and easy to learn, but that’s not its biggest quality. figure_01_txt2tags_gui What I like is that the actual software consists of one small Python script that runs in a graphical interface (shown below) or at the prompt with a few options, without depending on any third party library or additional module. Unless your operating system doesn’t support Python, you only need to have that script and any text editor to work.

Sure, you need other software to generate PDF files from LaTeX, and auxiliary shell scripts for pre- and post-processing like what I describe below but (unlike what I found in other markup systems) that’s all Free Software that’s guaranteed to be already packaged in almost all Gnu/Linux distributions (including server-oriented ones, for automatic remote processing!) and also available for Windows. Besides, being a command line tool that can accept text from STDIN or send it to STDOUT, txt2tags integrates perfectly with any other script-based text processing procedure one may need.

Ultra-quick intro to txt2tags syntax pros and cons

The markup syntax of Txt2Tags (see its online demo) leaves the source text very readable. Headers have one or more equal signs at the beginning and end of the line. Numbered and non-numbered list items start with a dash or plus character. Hyperlinks are included in square brackets, asterisks delimitate bold text and slashes are for italic. To build tables you must enclose the content of each cell in pipe signs (|). Comments start with a percent and preprocessing directives with a negated comment (%!). The only two things I care about that txt2tags doesn’t support natively are footnotes and cross-references to tables and figures. For footnotes there’s one workaround in this tutorial and one in the txt2tags configuration file of S. D'Archino. Cross-references are (relatively speaking) much more complicated to add, but are still possible by generalizing the approach described in the final part of this article, if you really need them.

Click to read the How to transform (almost) plain ASCII text to Lulu-ready PDF files, part 2