Saturday, July 14, 2007

Mid-term Report

It's hard to believe, but it's already time for the middle-of-the-term progress report for Summer of Code, and that applies to us OpenMRS interns as well. So, here goes.

I'm back from Florida (pictures at http://pics.discosoup.net/FL.html for anyone interested) and back to work. The question I've been asking myself this past week is "How can I make things simpler?" in terms of form creation in OpenOffice.org. OpenOffice's XForms component is much less polished than Microsoft InfoPath so it requires a few more manual steps that the user needs to do to create forms. While the steps aren't difficult, they're tedious and time consuming and I felt that they seemed daunting to OpenMRS implementors and developers that may be converting forms to OOo and I wanted to see if I could cut them out.

Data Types


The first tedious step involves manually entering the data type for each XML element in the form when you create a binding. While InfoPath reads in the XML schema directly and does this for you, OOo does not. Users are required to specify the data type for each binding, and this requires either looking at the schema or at InfoPath to find out if a control should be a simpleType, simpleContent, Integer (decimal in OOo), _infopath_boolean, etc.

My first thought was that I could either write a Python script or some sort of XSLT trickery to manipulate the XML of the .ODT files to associate entries in the form with the proper data type from the schema (.odt files contain a content.xml which has both the styles that define the appearance of the document and the XML template + bindings), but this was too complicated. I looked at the output of submitting an OOo form to an XML file and noticed that none of the data type information (simpleContent, simpleType, etc) was saved. I compared this with the output from an InfoPath form, and it was the same there as well - no data type information saved. So I asked my mentor about this and we came to the conclusion that it's only necessary for data validation in the form itself (user input) and doesn't need to be kept track of in the bindings. While it's a good idea to make sure certain fields are restricted to integers or dates, it's not necessary for OpenOffice to keep track of whether checkboxes or radio buttons are simpleContent or simpleType - the output will be the same. This successfully cuts a lot of tedium from the form creation process.


Binding Names


Another tedious part of form creation is having to rename each binding after you create it in OOo. Creating new bindings results in bindings called "Binding 1", "Binding 2", etc. These names are somewhat less than descriptive, and I had been renaming them to their XPath expression equivalent - so "Binding 1" becomes "patient/patient.sex", etc. This is functionally irrelevant mind you - OOo doesn't care what it's named, it's just for user reference. Still, I prefer to have meaningful names so they can be easily referred to later.

I opened content.xml in jEdit to see how binding names were handled, and here's what I found:

<xforms:bind nodeset='patient/patient.sex' type='simpleContent' id='Binding 1'/>
is an example of a binding with the default name. This is later referenced in the definition of a control style:

<form:radio form:control-implementation='ooo:com.sun.star.form.component.RadioButton' form:label='M' form:name='nameOfRadioButton' form:image-position='center' form:id='control19' form:value='M' xforms:bind='Binding 1'>
It should be fairly easy to write a script that replaces every instance of 'Binding X' with the "nodeset" value, so that cuts this step out of form creation in OOo as well. All that's left now is to drag elements from the XML tree to the document, change the control type and label/properties, then position them to fit where you want them to - pretty easy stuff.


template.xml and FormEntry.xsd extraction


Each OpenMRS InfoPath form contains two important files: template.xml and FormEntry.xsd. Template.xml has the actual XML data that's edited by InfoPath. When a form is published, form data entered in the UI is written to the appropriate XML tag in this file. FormEntry.xsd is InfoPath's version of XML schema (one of the reasons OOo doesn't read this in - it's not quite a proper schema, but MS's own version). This contains all of the data type values for each element as well as valid HL7 values.

It's necessary to get both of these files for the form you want to convert to OOo. InfoPath's .xsn files are CAB files, so you need a CAB extractor to get at them. I've found that WinRAR works for Windows, Unarchiver on OS X, and cabextract on Linux. You can find links to these on my wiki instructions, and I'll soon be modifying the instructions to include a section on extracting these files.


What's Next


After I finish the wiki instructions on form creating in OOo, there are two big tasks to tackle: loading patient data from the webapp and recreating InfoPath's taskpane.

I've been thinking about loading patient data a lot these past few days. Currently, when a user loads a patient's form from the webapp, it sends an .infopathxml file which is a filled-out version of template.xml with instructions at the top so InfoPath knows which form to open. Paul is very adamant that OOo also has this functionality so it fits in with the FormEntry workflow just like InfoPath. I'm confident it can be done, and I've been thinking about several solutions:
As mentioned in a previous post, you can add local or remote instance data to a form in OOo. This can load patient data into your form, but it involves manually clicking "Add instance data" and selection the XML file containing patient data. Doable, but not at all an elegant, easy solution. I thought perhaps I could write a macro that automatically does this, but that would require users to have a local copy of the form they wish to do this with, and that's not very good either (currently forms are stored on a server and InfoPath opens them remotely).

After a discussion with Justin Miranda on the IRC, an idea popped into my head. Since OOo's .odt files are just zip files containing several XML files, and since all template data is saved in content.xml, why not have a script on the server that grabs the appropriate form, replaces the blank template data in content.xml with the filled-out .infopathxml files containing patient data, and send this to the user? Should be quite doable in Python, and from the user's perspective, the only thing that changes is that OOo opens instead of InfoPath. OOo can also open files from remote URLs, making this even easier.

Justin also mentioned the idea of having one form for everything and one XML file per patient. That way, every patient would have one form and one XML file. While this would simplify things IMMENSELY, it would require a lot of refactoring of the way OpenMRS works, and you could run into problems caused by adding/removing fields from the forms and would have several different versions of the form running around. Also, both the form and XML would be quite large, and this could be an issue for users who only need to edit a small section of data.

As far as InfoPath's taskpane is concerned, I'm not quite there yet. The taskpane is a tab in InfoPath that executes some javascript to pull in remote data via http. This allows users to select information (such as the location of a patient encounter) from a remote URL, all from within InfoPath. While all of this can be accessed via a regular web browser, it's convenient to have everything in one application. The quickest and dirtiest solution would be to connect macros to buttons that open the target URL in the user's $browser, but this loses the "all-in-one-place" appeal. Other options include using small web browsers (via a script) in a floating window in OOo and/or pestering the OOo developers to add this functionality. I welcome any suggestions on this issue.

I hope my progress thus far has been sufficient (I've gotten pretty good feedback from both OpenMRS and OpenOffice.org, so that's good), and I look forwarding to continuing for the rest of the summer and hopefully beyond as well. I truly enjoy doing this.


The OpenOffice.org Community


In order to work on this project, I've had to involve myself in the OpenOffice.org community. I've been reading and posting to the Users mailing list as well as the XML developer's list and hanging out on the IRC. This has all been immensely helpful and rewarding to me because I'm learning about OpenOffice by reading and answering other users' questions. You learn a lot about a subject when you're helping others because you're forced to really get in there and figure out how something works.

Andrew Pitonyak, author of the previously mentioned book, "OpenOffice.org Macros Explained", has noticed my work with XForms on my blog and the OpenMRS wiki and dealt some flattery my way by suggesting that I write the XForms documentation for the OpenOffice manual at OOoAuthors. I look forwarding to contributing to this as much as I can. There's quite the lack of OOo XForms documentation out there, and I'd very much like to share what I've learned with others.


All in all, this has been a very exciting and enjoyable summer, and I look forward to what lies (lays?) ahead.

No comments: