Yesterday Sergio, a user of OpenOffice Impress, sent to the OpenOffice.org discussion list his list of the “Major Gaps of OpenOffice Impress 3.3 vs. Microsoft Office PowerPoint”. Continue reading Major gaps of Open Office Impress versus Microsoft Power Point, what do you think?
The script and tricks in the ODF scripting section of this website show how to create office-ready texts, presentations and spreadsheets automatically, in the OpenDocument format, which is a worldwide standards. This is all many people need to work today. Sometimes, however, it’s still necessary to either print those documents, or exchange them to somebody in other formats, like PDF or those of the older releases of Microsoft Office (newer releases of this program are already partially compatible with OpenDocument through free plugins, so if your partners have those versions they should really use those plugins, instead of bothering you with requests for drug-like, legacy file formats, but that’s another story).
Of course, if you only need to print or convert to other formats only once in a while there’s no reason to not do it from OpenOffice. The simple tricks explained below, however, are a life-saver when you need to do this many times, and of course you’d like your computer to do it for you while you have a coffee or something.
On Linux systems it is easy to do all this, and even send the converted files via email, automatically. Let’s assume that you have an OpenDocument text, spreadsheet or presentation already sitting in some folder, waiting to be processed.
Both printing and conversion to PDF, HTML or MS Office formats from the command line need OpenOffice to work. In the second case, the reason is that what makes the actual work is one of the OpenOffice macros linked below: when you launch OpenOffice, it executes that macro on the file indicated by the user and then exits. Macros are not needed for printing because OpenOffice has dedicated options for that. Usage of OpenOffice from the command line is explained on the OOo wiki. In a nutshell, this is the correct syntax:
soffice -invisible macro://path-to-macro($FILE)
On some systems, you may need to provide the complete path to the soffice program. The -invisible option is what makes OpenOffice start without a graphical interface. The file to process must be passed as argument ($FILE) to the macro.
The command above is all you need if you are working on a complete Gnu/Linux desktop, that is a system that also has a graphical interface server (called X server). For the record, you can do the same thing in Windows with a batch script like this (taken from an OOoforum thread):
@ECHO OFF "c:program filesOpenOffice.org1.1.4programsoffice" -invisible "macro:///Standard.Module1.ConvertToPDF(%1)"
When you want to work inside a Linux Web or print server, instead, that is on a computer where X was never installed, you need to set up some extra variables before launching OpenOffice, otherwise it won’t start. This is how to do it (the explanation for the extra commands are in the thread in which I found them, which also includes instructions on how to install OpenOffice on a (remote) server:
export PATH=$PATH:/usr/bin/X11 export LANG=en_US export HOME=/var/www xvfb-run -a /usr/bin/soffice -invisible macro://path-to-macro($FILE)
Please note the extra piece in the actual command, that is in the last line above:
`xvfb-run -a`. Xvfb is a smaller X server used in special situations like this, when a full X wouldn’t be installable. Also, don’t forget that, depending on the server configuration and your actual needs, you’ll probably have to change the LANG and HOME variables.
Show me the macros!
The previous paragraph explains how to run OpenOffice from the command line on Linux or Windows in order to execute any macro. Let’s now look at the actual macros we need to print or save in Microsoft or other formats. There are several ones available online.
Those with the best explanation, which includes details on how to install any macro in OpenOffice, are SaveAsPDF and SaveAsDoc. The beauty of these macros is that it is very easy to modify them to save in HTML or any other format that OpenOffice can handle! You just have to substitute the right values for the file extension (MYEXTENSION) and the filter name (MY_FILTER_NAME) in this part of the macro:
cFile = Left( cFile, Len( cFile ) - 4 ) + ".MYEXTENSION" cURL = ConvertToURL( cFile ) oDoc.storeToURL( cURL, Array(_ MakePropertyValue( "FilterName", "MY_FILTER_NAME" ),)
Another macro that saves an OpenDocument file in PDF format was posted to the Fedora mailing list. Whichever macro you choose, put it in a suitable folder, accessible from the script and user account that will use it, and replace the path-to-macro string above with the actual full path to the macro in the file system.
How to print or email OpenDocument files from the command line
In order to do this we just need two other command line options of OpenOffice (see here for the complete list or type
`soffice -?` at a command prompt to get a complete listing):
soffice -invisible -p <documents...> soffice -invisible -pt <printer> <documents...>
They both print all the specified documents. The only difference between them is that the first one uses the default printer, the second looks for the printer given as first parameter.
Finally, if you also want your script to email on your behalf the files that it generated in this way, you can use the text-based Mutt email client in this way ($EMAIL_TEXT is a separate text file containing the text of the message):
mutt $RECIPIENT -s "$UBJECT" -i $EMAIL_TEXT -a $FILE_TO_BE_ATTACHED
if you find any error in this page or have any suggestion, please tell me (but remove the numbers from the email address first!)
The opening session was a cool moment, both for the location (the Hungarian Parliament) and for the content. We started in the very hall of the Parliament. Incidentally, the first thing I noted there has nothing to do with OO.o but is a general problem of the FOSS and programming worlds: of about 150 people in the hall, no more than 10% were women, even if OO.o and FOSS users aren’t certainly 90% males, are we? But I digress.
Slideshows are extremely popular as presentation and educational tools, but have a couple of serious problems. The first is readability: let’s admit it, many slideshows are almost unusable. One of the secrets to useful slideshows is terseness. Each slide should contain only a few short points or pictures which summarize the key concepts you want to transmit to the audience with that part of your talk.
The other big issue with slideshows is that GUI presentation software, be it PowerPoint, OpenOffice Impress, KPresenter or anything else, can be quite time-consuming and distracting, no matter how you use it. Writing bullets and sub bullets as simple text outlines is much faster, even when you’re just pasting together notes you scrabbled on your PDA, email fragments, quotes from Web pages or thoughts of the moment.
(Note: these are the comments appended to my original article, which I had to put in a separate page when I switched from Drupal to WordPress)
Just came to your site…
Just came to your site following a link from linuxtoday. Wow! This opens windows of opportunities! Somehow I’ve totally missed out on the fact that odt documents are just zip files. I’ve been reading a bit through some content.xml files. And it seems that it should be possible to use openoffice from a text editor just as fine.
The OpenDocument Format (ODF) is an internationally recognized open standard for digital office documents whose importance has also been acknowledged by Microsoft. ODF is good for a lot of reasons I have already explained in Everybody’s Guide to OpenDocument. However, there is also one more reason why ODF is great for everybody who must produce a lot of office documents, one that will be the subjects of many posts on this website: ODF is really simple to generate or edit automatically. Even if you aren’t a professional programmer, it takes very little effort to put together a script that generates or processes in any way texts, presentations or spreadshets in ODF format.
How the openness of ODF makes automatic generation of documents much simpler
Very often, we use computers to produce many different versions, every time with new data, of some reference text, presentation or spreadsheet. Changing those kind of files manually makes sense only if it happens once in a while. When it’s a regular activity, instead, it can become a huge waste of time. ODF, however, makes it very quick and easy to insert raw data into texts, spreadsheets or presentations with the slightest possible amount of manual work and without even running OpenOffice. This is possible because an ODF file is just a ZIP archive, with pictures and macros in their own folders and the actual text written, in XML format, inside a file called content.xml. Therefore, in order to create a new, 100% compliant ODF file with different data, tables or images, you only have to open the archive, process the text inside content.xml or put new pictures in their folder if necessary and zip everything again. You must only use OpenOffice once, to create a template by hand if you if you don’t find a suitable one online.
The power of script-based ODF processing
You could perform repetitive generation and editing of ODF office files even manually, with a text editor like Notepad, Emacs or VI. The real power of ODF, of course, is in the fact that you can (and should) do all that processing automatically, with very simple shell or Perl scripts, that is with tools that are included in any Gnu/Linux distribution but can also work on Windows and Mac. The main advantages of this approach to office document processing are:
- it works even without Openoffice, so it could even run on a server
- there is no need of any relational database but you can use one if necessary
- learning to do these things with shell scripts instead of OpenOffice macros:
- above all, it’s much simpler (and faster) than you’d think!
The last point is the most important. Using the method explained here everybody with just a basic grasp of shell scripting can generate, modify or analyze hundreds of ODF text documents with just a few minutes of easy coding.
Of course, this approach is not really flexible, scalable or really robust, unless you add lots of code for error management, but the idea here is not to develop industrial strength solutions. If that’s what you need, you’ll have to either use real XML based tools like Odfpy or go straight to the source, the book OpenDocument Essentials by J. David Eisenberg, that you can also purchase at Lulu.com.
This said, there are tons of cases where heavyweight tools like those aren’t worth studying, installing and deploying, but people still end up wasting many hours on repetitive edits. Learning how to write quick and dirty shell scripts that can open and update an ODF file is an easy but huge time saver in such situations.
Here are some of the ODF scripting recipes that you’ll find on this website in the next days (but if there are other recipes that you would like to see published, just ask and if possible I’ll write them!):
- create ODF text invoices from databases or plain text files
- create multiple choice texts printouts, each with a different set of randomly chosen question
- transform plain text outlines in simple presentations
- fill spreadsheets with data taken from databases or system logs
(note: some of these posts are updated excerpts of articles originally written for Linux Format, and are republished here with their permission)
Articles on how to create OpenDocument invoices already exist but almost always they require you to start and use OpenOffice manually each time. Here, instead, I’ll show how to have your computer to do all your OpenDocument work for you.
This is a way to import multiple pictures into an empty text with OpenOffice Writer, one picture per page, with a database report.
- install the Sun Report Builder extension
- create a database in the same directory as the pictures
- enter or import in the database the file names of the pictures
- make sure that those names don’t contain special characters or spaces
- use absolute, full path names to step around a bug in the builder
- in the database form, bind a picture control to that field.
This is a synthesis of a discussion on the OpenOffice users list. If you know more efficient methods, please let me know (mfioretti, at nexaima; dot net).
The second day of the Plugfest followed the same general scheme of the first one (covered in a separated page): a non-technical introduction followed by lots of hacking, feature analysis and product anticipations.
A representative of the Spanish Ministry of Presidency, Miguel Angel Amutio Gomez, started the day explaining the crucial points of the Spanish law 11/2007: the right for everybody to use whatever digital technology they like best and the obligation for all Public Administrations to avoid discrimination of citizens based on their technological choices. In order to make this possible, the law stresses the importance of open standards, setting the goal that all e-government services and documents become available at least through such standards. In this context, Amutio said, the Spanish National Interoperability Framework (NIF) that A. Barrionuevo presented in the first day becomes an essential legal test for all Spanish organizations.
Besides the law itself, the most interesting part of this presentation were the results of an analysis made using the NIF: it turns out that 30% to 40% of the about 400 digital standards already used for e-government in Spain comes from (in decreasing order): IETF, OASIS, W3C, ISO, and Microsoft.
After this stimulating factoid, the conference went back to strictly technical topics. Michiel Leenars of the Opendoc Society, showed us all you can do with OfficeShots. This is a rendering farm for office documents, primarily aimed to developers and power users but useful for everybody. When you upload your file to the OfficeShots server it will show you how it will look with many combinations of office suites and operating systems. If needed, an anonymizer kindly provided byt the lpOD folks (see Day One report) scrambles all the text to avoid online dissemination of sensible data. The most advanced use cases for OfficeShots include:
- check before buying which ODF software works better with the kind of documents you actually need
- design styles for corporate templates that render correctly on all office software products
- (for developers) testing the interoperability of your program with other office suites
There is also a test suite gallery to help developers to test new versions of their products quickly. Of course, OfficeShots is an Open Source project to which everybody can contribute.
Microsoft and OpenDocument
Before lunch, Mario Wendt summed up the current status of Microsoft support for OpenDocument. Here are the main points:
- Office 2010 will include over 1000 odf related bug fixes
- Implementer notes for Office 2007 and Office 2010 requested by the community already available (even if only as single PDF files)
- Study of the differences between ODF 1.1 and 1.2 in order to provide appropriate feedback
- Ongoing work to contribute to finish OpenFormula
- Commitment to support ODF 1.2 nine months after its formal approval by ISO
Opendocument on Android!
Using ODF documents on the road becomes easier and easier every year, at least for text ones. Oliver Mas presented the Android port of ODFMovil, a J2ME ODF viewer developed by Cenatic under an Apache 2/GPL2 license. The port, which was also intended as a pilot to estimate the complexity of porting bigger applications, was relatively easy due to the four-layer architecture of ODFMovil: core utilities, file management and compression, XML parsing, GUI. As it turned out, only the first and third layer could be ported. File management and graphical interface had to be rewritten from scratch. More details are available in Oliver’s ODFMovil slides.
OpenDocument for financial firms
There is a part of the OpenDocument standard that is really critical for its adoption, at least in some markets: the financial formulas that are used to calculate things like interest rates, mortgage payments and similar amenities. As Rob Weir put it, “an economist whose predictions aren’t wrong more than 10% is a genius, but a banker making a 1% error is a criminal”, so it’s essential that the formulas in OpenDocument don’t make any mistake. This is more complicated than it seems, to the point that a whole session of the Plugfest was reserved to this topic.
There are at least five ways to calculate the number of days between two dates that can in real life you can find mixed in different ways, depending on the traditions and regulations of every country (not to mention leap years): sometimes all months are assumed to last 30 days, and the year can be 360 days instead of 365. Since this changes how much you still owe to your bank after a few years of mortgage, it is absolutely necessary that all ODF compliant applications are 100% interoperable from this point of view. This means that each of them must declare without ambiguities what method it uses for day counts and how it handles (at least) the YEARFRAC() formula, even when it appears in a spreadsheet generated by another application.
Change tracking and handling of unique features
Another important moment of Day Two was the hacking session on management and interoperability of change tracking. Mario Wendt presented a test case that all developers could create and test on the spot, reporting the result on a common wiki: a text document with a fixed layout mixing a table, several paragraph and numbered list, on which everybody had to perform a well defined series of edits.
Saturday morning, instead, was “unique features” time. There are functions of some ODF capable suites that are simply absent from other programs: notable example are music shapes in KOffice or SmartDraw graphics in Microsoft Office. What must happen when you use such features in an ODF document that you want to share with users of other software? The session theme was that developers could, as a minimum, guarantee that:
Careful readers of my ODF Day One report will immediately notice the intersection of this issue with the topic of my own talk and with what Rob Weir said about metadata interoperability. For the record, Rob also pointed out that the problem presented in my talk could also happen in another way: what if you insert into an OpenDocument file a macro in an open programming language (e.g. Python) that implements a patented algorithm? It will be really interesting to see how Public Administrations and other large organizations will handle all these issues in the medium/long term (if they see them in the first place, of course).
Converting wikis to books in OpenDocument format
The last cool thing I’d like to report from the Plugfest is a cool use of lpOD that Louis showed us on Friday: it turns out that you can use this software to export a whole wiki, or selected parts of it, to a book in OpenDocument text format. Besides a flexible templating system, the software gives you automatic generation of the Table of Contents and support for footnotes, tables and images. This could be a very handy publishing tool for schools writing their own textbooks or any other organization that creates text content working through the Internet!
The ODF Plugfest is a Conference whose goal is to to achieve the maximum interoperability between competing applications, platforms and technologies in the area of digital document sharing, and to promote the OpenDocument format (ODF). This page, as the others that will follow on this website, is a short technical summary, primarily aimed at developers, of what happened during the first day of the conference. Later next week I’ll also post a non-technical summary of the whole event at the Stop.
My own talk, which you can dowload at mfioretti.com, was about a problem I first saw in 2006/2007: how do you prevent proprietary components from “polluting” open container formats like ODF?
Alberto Barrionuevo explained how Opentia contributed to the Spanish National Interoperability Framework (NIF). This is a very interesting work, that could and should be replicated in other countries: Opentia took the full text of the 2007 Spanish law about e-government and translated it in mathematical format in order to match, in the most unambiguous possible manner, each legal requirement to equivalent technical ones.
The result is a huge spreadsheet that implements a finite state machine and tells you if and how ammissible each of about 550 protocols, file formats and programming languages is for e-government usage in Spain (for the record, about 80% of them pass the test, at least partially.
Rob Weir of IBM summed up the status of the next version of the standard. ODF 1.2, which is almost done, will be divided in three parts: one for the core schema, one for the container and one for OpenFormula (do you remember that the first generation of ODF compliant spreadsheet suites lacked formula compatibility? This should fix that problem for good). New features will include digital signatures, support for RDF capabilities (see below) and native tables in presentation slides. An Interoperability demonstration of ODF 1.2 will take place at the OOoCon Conference in Budapest next September. Rob also mentioned that everybody can send in suggestions for the next version of the standard, that should include things like modularization, web profiles, enhanced SVG support and Xform integration. You can either answer OASIS calls for public comments, join the OASIS ODF TC or implement ODF 1.2 and send feedback.
Later on, Rob also introduced another theme that could and should get a lot of attention in the next years, and that is also somewhat related to what I said in my own talk: ODF metadata and interoperability. What should happen if your editing software loads and ODF file containing metadata that that software doesn’t understand? Should it preserve or ignore those metadata?
In some cases the answer is easy: metadata that behave like visual attributes, eg the bold tag (where the attribute (boldness) is separated from the data or content it applies to) can be removed or ignored without any real damage. Go beyond that, and interoperable metadata are much harder to achieve.
What if, for example, a document is pasted into another one that has a different value for the same metadata, for example the one indicating who in the organization is responsible for approving that text? An even more interesting case is “Should I digitally sign a document that contains metadata I can’t see?”
Another interesting moment of the day was when Jos van den Oever presented the work for RDF support for ODF 1.2 in KOffice. Generally speaking, RDF should help to add to the text enclosed in a document information about the meaning of that text, in a format that is directly and easily readable by a computer. Consider, for example, a sentence like “Paris is hot”. If it’s in plain text, for a computer it’s very difficult to understand, even by looking at the rest of the document, if it means that temperatures in the French capital are high or that Paris Hilton (or the mythological Trojan prince???) is sexy. Adding an RDF data triple to the Paris string, that labels it as a city, would eliminate that ambiguity. RDF could deeply change our definition of “document”. If all your files were tagged in this way, it would become much easier (and portable) to ask your computer questions like “find me all the cases discussed by my law firm that involved unemployment law, but not in its home town”.
The end goal is to get to a whole desktop that can use RDF, where the user can read and write docs with RDF, directly cut and paste them between applications. The current work on KOffice will allow it to show the user all the RDF triples and the corresponding text strings in a document, and to use those data to check and show locations on digital maps, or export appointments or phone numbers straight into calendars or address books.
Day one of the ODF Plugfest ended with a presentation of several interesting tools that generate, convert or analyze automatically all kinds of ODF files:
- ASPOSE.WORD a .NET and Java library for document processing in ODF and many other formats in cloud,
single server or desktop environments
- lpOD, an even more interesting (for my personal use, that is) library to create ODF documents. lpOD is already usable in Python (Perl and Ruby will follow) and comes with an online cookbook
- ODFDom Library, another library, in Java this time, to library, written to create, access and manipulate ODF files, without requiring detailed knowledge of the ODF specification
(note to all Plugfest participants and all other readers who want to comment or add something to this page: I’ll enable comments from anonymous users as soon as I can configure the corresponding spam filters, but in the meantime please register or send me an email at mfioretti, at nexaima dot net. And if you want to translate this page, just let me know)