Tuesday, June 28, 2011

Developing software is a lot like solving a Rubik's Cube. I know that software --> Rubik's Cube analogies are overused, but allow me to make my case:


Ripley Rubik is a customer who approaches your company, Cubits Solutions, and explains he has a problem. His company, Rubik's Polychromatic Six-Sided Wooden Enjoyment Cubes, uses a wooden block as part of their manufacturing process. Each side of the block is comprised of 6 colors: green, red, yellow, blue, white, and orange; with one side having a seemingly random mix of all 6 colors. He explains to you that his employees currently have to search all over the block to find all the blue parts, then all the red parts, and so forth until all parts of each color have been found. The problem is, this takes a lot of time and employees, and he's looking for a way to speed up the manufacturing process and cut down on costs. You think you can help Ripley out, so you schedule a meeting with your chief architect, lead developer, interface designer, and project manager.

During the meeting, everyone decides that having more of the colors grouped together would definitely speed up the process. Since a cube has 6 sides and there are 6 colors, it's decided that having no more than 3 colors on each side would significantly reduce the time spent searching for all the colors. Ingrid, the interface designer, creates some drawings based on this, and you show them to Ripley.

Ripley immediately rejects this idea and tells you that, while he likes the idea of grouping together the colors, your design won't work because the center portion of each side can only be one color, and each side has to have a different color at the center. You return to the drawing board, and the team comes up with a new design: each side would have a single color at the center, and the 4 corners of a side would also be the same color as the center.

Ripley likes this idea, but he's concerned that it will still take too much time to search for all the colors. Ingrid suggests that there could be only one color per side, and everyone loves the idea. She creates another set of mockups and shows them to Ripley. Unfortunately, Ripley says this won't work because the layout is all wrong - his process requires that white be on the front of the cube, with red on the left side, orange on the right side, green on the top, blue on the bottom, and yellow on the back. You suggest that, instead of a cube, you create a suite of squares, one of each color, so that Ripley could orient them however he likes. He rejects this idea because he knows his budget would never allow for the maintenance of six separate tools, and he'd really prefer a single cube so that a single employee could use it if necessary.

With these new requirements in mind, the team goes to work on writing a requirements specification. Ripley signs off on the design, which says something to the effect of "The product must be a wooden block with six sides (a cube). Six colors must be represented on the cube: green, red, yellow, blue, white, and orange. There can only be one color per side, and each side must be a different color. The colors must be oriented as follows: white on the front-facing side, red on the side to the left of white, orange on the side to the right of white, green on the topmost side (above white), blue on the bottommost side (below white), and yellow on the rear side (opposite of white)."

Now that the requirements have been specified, Archie, your chief architect, goes to work on the technical design. His first suggestion is to build a new layer on top of the current design (the random assortment of colors) - he could paint a solid color onto each side rather than build the entire project from scratch. However, Lenny, the lead developer, points out that this solution would end up leading to more problems down the road because the underlying colors could still cause problems if they bleed through to the top layer. Percival, the project manager, agrees this could be a problem, and he also asks Ripley if his budget allows for a complete rebuild, or if he'd rather they adapt his current wooden block.

Ripley decides it would be too costly to build an entirely new block, so the team will have to build upon the current design. Since the requirements have changed again, Percival amends the requirements specification to read "The product must build upon Rubik's Polychromatic Six-Sided Wooden Enjoyment Cubes' current wooden block."

Archie goes back to work on the architecture, and comes up with a plan to divide the cube into smaller squares. Each square would contain a single color, and these squares could then be moved around to their proper side. In order to facilitate the movement of the squares, he comes up with a system based around a central core, where each square would be able to pivot around on screws and springs. He first divides each side of the cube into 5x5x5 squares, but decides this is too complex and reduces it to 3x3x3.

Lenny likes this idea, but he's concerned that the wood will be very difficult to affix screws and springs to (in addition to being difficult to maintain later down the road), and suggests they look into a new technology called plastic. Percival brings this idea to Ripley, who decides that the initial cost of remaking his cube out of plastic is indeed worth saving maintenance costs later on. Percival amends the requirements again to replace all instances of "wooden" with "plastic", and Cubits Solutions hires a new developer who's an expert in plastics.

Lenny and Archie design a system of rotations to put the blocks into place. They come up with some algorithms for moving the squares around, and decide that they will only rotate one side at a time. They decide to call the sides L, R, U, D, F, and B, for left, right, up, down, front, and back, respectively. They create a new framework called Solve which specifies individual operations which can be performed on a section of squares: CW and CCW, for clockwise and counter-clockwise rotations. In order to perform an operation, a developer would write "LCW" to indicate that the left side should be rotated clockwise.

With the new cube fully designed and planned out, development can begin. Percival plans out a workflow for the developers, and it's decided that 6 developers will each work on an individual color/side. With the guidance of Lenny and Archie, he decides that each developer will first align the corners of their side, and then move on to the other squares. Developers will spend the day writing down the steps needed to align their colors (FCCW, UCW, RCW, etc), and at the start of the next day, they'll take turns rotating parts of the cube and spend the rest of the day writing down new instructions. The developers write down the first day's instructions and then go home.

At the start of the second day, Bluto, the developer assigned to blue, performs all of his rotations and aligns the corners. However, once Rory, the developer for red, performs his first rotation, it's immediately clear there's a problem - blue's side is no longer aligned! Percival sits down with Lenny, Archie, and the developers, and they start to come up with a new plan. They decide to only operate on certain sections of the cube at a time, and Solve's algorithms are refined to allow more complex operations which preserve the structure of other parts of the cube. A new development plan is written that allows developers to take turns performing operations instead of doing a large set of operations at once.

Furthermore, they implement a test plan to be used after each operation: each side would be checked to ensure that no pieces were moved unexpectedly. Additionally, once a corner or section was put into place, developers would verify that the colors were still aligned correctly.

Percival continues to oversee development while keeping Ripley updated on the project's progress. After two weeks, all sides of the cube have been aligned to a single color, and the team checks this against the requirements to ensure that everything is as it should be. Ripley is pleased with the product and happily writes a check to Cubits Solutions. Success!


So there you have it: an unnecessarily long story which both attempts to explain software development in terms of a Rubik's Cube while also demonstrating my ignorance about how solving algorithms actually work =)

Saturday, June 11, 2011

ABC, 123, Lucky Me

For no reason in particular, I decided to go to google and enter each letter of the alphabet (lowercase a,b,c,...,z) and each single digit (0,1,2,...,9) and hit the "I'm Feeling Lucky™" button. In case you don't know:
The "I'm Feeling LuckyTM" button automatically takes you to the first web page returned for your query.

An "I'm Feeling Lucky" search means less time searching for web pages and more time looking at them.


As expected, many of the pages returned were wikipedia entries, especially for the numbers. There were a few surprises, including a non- Oprah result for 'o', and a relatively new google service for '1'. If there's any interest in this little experiment, I may go back through and list the first non-Wikipedia results.

Here are the results for each search:


a

A - Wikipedia
http://en.wikipedia.org/wiki/A

b

B - Wikipedia
http://en.wikipedia.org/wiki/B

c

Citigroup, Inc. New Common Stock - Yahoo! Finance
http://finance.yahoo.com/q?s=C

d

Intro - D Programming Language 2.0 - Digital Mars
http://www.digitalmars.com/d/

e

E! Online
http://www.eonline.com/

f

Ford Motor Company Common Stock - Yahoo! Finance
http://finance.yahoo.com/q?s=F

g

Gmail
https://www.google.com/accounts/ServiceLogin?service=mail&passive=true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&bsv=llya694le36z&scc=1<mpl=default<mplcache=2&from=login

h

Hydrogen - Wikipedia
http://en.wikipedia.org/wiki/Hydrogen

i

I - Wikipedia
http://en.wikipedia.org/wiki/I

j

J.Crew
http://www.jcrew.com/index.jsp

k

K - Wikipedia
http://en.wikipedia.org/wiki/K

l

L - Wikipedia
http://en.wikipedia.org/wiki/L

m

M (1931) - imdb
http://www.imdb.com/title/tt0022100/

n

N - Free Online Action Games from AddictingGames
http://www.addictinggames.com/action-games/ngame.jsp

o

"O" - Cirque du Soleil
http://www.cirquedusoleil.com/en/shows/o/default.aspx

p

P paragraph HTML 4.01 strict
http://www.december.com/html/4/element/p.html

q

Q (Star Trek) - Wikipedia
http://en.wikipedia.org/wiki/Q_%28Star_Trek%29

r

The R Project for Statistical Computing
http://www.r-project.org/

s

Sprint Nextel Corporation Comm - Yahoo! Finance
http://finance.yahoo.com/q?s=S

t

Massachusetts Bay Transportation Authority
http://www.mbta.com/

u

U - Wikipedia
http://en.wikipedia.org/wiki/U

v

V - ABC.com
http://abc.go.com/shows/v

w

W. (2008) - imdb
http://www.imdb.com/title/tt1175491/

x

-X- the band official website
http://www.xtheband.com/

y

the Y: Find Your Y (YMCA)
http://www.ymca.net/find-your-y/

z

Z (1969) - imdb
http://www.imdb.com/title/tt0065234/



0

0 (number) - Wikipedia
http://en.wikipedia.org/wiki/0_%28number%29

1

Google +1 Button
http://www.google.com/+1/button/

2

2 (number) - Wikipedia
http://en.wikipedia.org/wiki/2_%28number%29

3

Theband3.com - Official Site of 3
http://www.theband3.com/

4

4 (number) - Wikipedia
http://en.wikipedia.org/wiki/4_%28number%29

5

5 (number) - Wikipedia
http://en.wikipedia.org/wiki/5_%28number%29

6

6 (number) - Wikipedia
http://en.wikipedia.org/wiki/4_%28number%29

7

7 (number) - Wikipedia
http://en.wikipedia.org/wiki/4_%28number%29

8

8 (number) - Wikipedia
http://en.wikipedia.org/wiki/4_%28number%29

9

9 (2009) - imdb
http://www.imdb.com/title/tt0472033/


Monday, May 9, 2011

Scrappy code sample

Here's a very short and simple code sample that I feel hi-lights my coding style and design philosophies. It's taken from scrappy, an IRC bot that a friend and I have hacked around on for several years.

It's part of the event handling framework for the core bot. When one of the bot's modules wants to register with a new event (like registering an action to be taken when someone types a bot !command in an IRC channel), it simply calls bot.register_event() and passes the type of event to hook on to (msg, CTCP, join, etc), the name of the function(s) to be called for that event, and a reference back to the calling module.

Whenever the bot's IRC socket handles an event, the appropriate function for that event type is called, and a new thread is spawned for each module-registered event function. You may recognize this as being very similar to the observer design pattern.

(To view this without scrolling, visit https://gist.github.com/mharrison/19156f2af72fa8048fab)



Here's why I chose this sample:

  • It's simple. The code is short and sweet, and it's both easy to understand and simple to write.

  • It's intuitive. This is the most logical and "common sense" way to call a series of event-based functions. It's similar to the way your brain would handle such a task.

  • It's elegant. The code is formatted nicely, and the source is easy to read.

  • It's user-friendly. Any user-written module easily interfaces with this via the register_event() function, and the different parts of the message are passed on to the called module in a nice package (dict) that other programmers can easily use.

  • It's pythonic. No code is reinvented. It uses the "batteries included" thread module to do the "heavy lifting" of spawning multiple threads. The thread module automagically handles all of the necessary locking and signaling associated with threading. It also handles KeyboardInterrupt and sys.exit() for you, ensuring that a thread cleanly finishes before the program terminates. Also, if one of the called functions should fail or raise an error, it is cleanly handled in the "background" and the rest of the bot will continue running without interruption.



So there you have it. A code sample which reflects my design philosophies of being simple, clean, elegant, and taking advantage of any system/language libraries available to the programmer.

Monday, December 20, 2010

You need to unplug your web server

You've all seen this: you're browsing the web, reading an article, when suddenly the page dims and THIS pops up:






Any time I'm on a site that does this, I immediately close the page. I don't care how interesting the article is. Even if it's lifesaving advice that I need to read or I will die RIGHT NOW, I close the page as soon as that pops up. It's intrusive (or obtrusive?), and it's terrible site design.


If your website does this, you need to unplug your web server from the internets immediately.

Thursday, April 1, 2010

Tag Literacy

Recently, I decided to make solving the problem of searching for images on the web my goal for the near future. It can be very difficult to find relevant search results because images are tagged either too broadly by humans ("woman", "purple", "car", "person"), or software is returning images based on the content of the surrounding text, which isn't always relevant (a big issue in Google Image Search).

I'll be writing a lot more about this issue on this blog, but today I bring you an interested read I just found. Ulises Mejias, an assistant professor at SUNY Oswego, wrote an article on his blog called "Tag Literacy" way back in 2005. He talks about distributed classification systems (DCSs), what makes a good tag, and the social value of tags. He touched upon a lot of the issues with tags and how we use them, and I definitely learned a lot from his post.

Definitely check it out at http://blog.ulisesmejias.com/2005/04/26/tag-literacy/

Wednesday, March 24, 2010

Change Firefox 3.6.2 Default Tab Ordering Behavior

If you've recently upgraded to Firefox 3.6.2, you may have noticed how the default behavior for opening new tabs by middle-clicking links has changed. Specifically, new tabs open immediately after the current tab instead of at the end of the tabs.

This behavior drove me *nuts* since I prefer my tabs be in a FIFO or First In, First Out ordering - I expect that the first tab I opened be the tab I'm going to read next, and the most recent tab I've opened should be at the end of the list. The new LIFO or Last In, First Out ordering made little sense to me.

Since I couldn't find any apparent way to change this in the preferences, I cracked open about:config and searched for "tabs". I found the "browser.tabs.insertRelatedAfterCurrent" option, which was set to "true", and I changed it to false. Perfect! Now my tabs are back to the way they used to be.

The Fix:
Type "about:config" into your address bar, or click here for a direct link.

Search for "tabs.insert" and you should see the following:


Simply double click this entry to set the value to "false".

UPDATE:
I found this issue in the bug tracker. Apparently it's been marked WONTFIX =(

Wednesday, March 10, 2010

Facebook Ads

Last night I decided to create a Facebook ad to promote my photography. I was surprised at just how inexpensive advertising was. When you create an ad (limited to 135 characters), you choose between CPC (cost per click) or CPM (cost per impression/views/1000). With CPC, you only pay whenever someone clicks on your ad. With CPM, you pay for every 1000 views (impressions) for your ad. For each model, you place a "bid" for each click/1000 views. Facebook then selects which ads to display based on which has the higher bid. You can also target your ads based on demographics and keywords, and you set a daily maximum for how much you'd like to spend per day of advertising.

For my ad, I chose the CPM model since it "felt" like it would get displayed more often. Although, I honestly don't fully understand the difference between the two model s (leave a comment if you can explain it!). I targeted it to everyone within 50 miles of Amherst, MA, and I added keywords like "photography, photos, pics, wedding, graduation". According to their estimate, I would reach about 28,000 users. Based on the bids for other ads targeting my demographics, facebook suggested a bid of $0.24-$0.29, and I went with $0.27. At the time of this writing, my ad has had 14,619 impressions and 0 clicks.

I can understand why I've gotten 0 clicks despite so many views - I barely even notice the facebook ads, let alone actually click them. I'm only doing 1 day of advertising (with a max budget of $2.50) as well, which isn't very long at all. I think I'm going to experiment with different wording, keywords, and running times to see if anything works better. I'll also try running an ad for a full week closer to graduation. My hope is that folks who are talking to friends about wanting wedding/graduation photos will see my ad displayed and click it.

Update: My ad has finished its budgeted run of $2.50 with 18,138 views and 1 click.

Monday, March 1, 2010

Security vs. Privacy

I was just having a discussion with someone on freenode (my username there is [mharrison]) about security vs. privacy. When they joined the channel, I mentioned that they were from Massachusetts (given their Comcast hostname), and they immediately asked how they could hide that. I asked why that was important, and they cited the usual - they didn't want any creepy Internet stalkers knowing where they were.

So what did I do? I volunteered my address and invited them to come visit. I've already sacrificed my privacy online, as most of us have. But I maintained my security. There's a big difference between the two. If I wanted privacy, I would have taken steps to hide my IP, name, and address. But why would I do that? What would this person have to gain?

Even after sacrificing privacy, I maintained my security. This person knows nothing of my passwords. They know nothing of my banking institution(s), my logins or passwords, nada. They know nothing of my physical security either - what's in my apartment, who lives around me, who I REALLY am, etc.

So why am I so secure in giving out my address? I'm confident that this person doesn't have either the motive or the interest to launch an attack on me, electronic or physical. 99% of the population doesn't care who I am or what they have to gain from me. And in the off-chance that this person WAS in the 1%, they're the ones going into an unknown situation - not me.

If you'd like to learn more about security and privacy, I highly recommend Bruce Schneier's blog.

Tuesday, February 23, 2010

Back...hopefully.

I like to take multiple-year breaks from blogging just to make sure the impact of my words really has a chance to settle in.

But, theoretically, I'm back now. Why? Because I'm nearing the end of my undergraduate stint, and I have a few ideas cooking around in my head that I'd like to talk about (and think about) some more while I explore my options in the job world. I have a pretty good idea about what I would like to research/work on, and I have a pretty good idea about the places where I'd like to do that, but we'll have to see which way the cookie crumbles.

For now, here's to hoping there will be something interesting to read on this blag soon.

Tuesday, December 9, 2008

Setting up Django on Jython with sqlite

After spending a night of frustration trying to get Django running under Jython and using sqlite3, I finally figured it all out. Here's the short version of what you need to do on a linux-y system (Windows users are on their own):



  1. Have the latest django from svn.
    $ svn co http://code.djangoproject.com/svn/django/trunk/ django-trunk

    If you haven't already done so, continue setting up as per the instructions at Django's site.



  2. Get the latest jython from svn. Pre 2.5 will NOT work.
    $ svn co https://jython.svn.sourceforge.net/svnroot/jython/trunk/ jython
    $ cd jython
    $ ant
    $ export PATH=$PATH:`pwd`/dist/bin




  3. Get django-jython from svn.

    $ svn co http://django-jython.googlecode.com/svn/trunk/ django-jython
    $ cd django-jython
    $ jython setup.py install




  4. This was the part that caused most of the headaches, getting sqlite to work.
    Grab SQLiteJDBC.
    Add the .jar to your $CLASSPATH.

    $ export CLASSPATH=$CLASSPATH:/path/to/jar/sqlitejdbc-v###.jar




  5. Add Django to your $JYTHONPATH.

    $ export JYTHONPATH=$JYTHONPATH:/path/to/your/python/libs/site-packages/django

    On OS X, this is /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django .



  6. Add django-jython to $JYTHONPATH.

    $ export JYTHONPATH=$JYTHONPATH:/path/to/django-jython




  7. You should be able to create new Django projects with

    $ jython django-admin.py startproject projname




  8. Edit your mydjangoproject/settings.py to include the sqlite3 database backend provded by django-jython.

    DATABASE_ENGINE = 'doj.backends.zxjdbc.sqlite3'




  9. Syncdb and runserver.

    $ cd mydjangoproject
    $ jython manage.py syncdb
    $ jython manage.py runserver


    ***NOTE***
    syncdb outputs a big nasty error for me. This is an issue with jython and developers are looking at it now. My project still seems to work after this error, but don't use this for any important data because it's not guaranteed to work.

    ***NOTE***
    When starting your project, you may encounter the following error:

    Error: Could not import settings 'mydjangoproject.settings$py' (Is it on sys.path? Does it have syntax errors?): No module named settings$py

    This is because jython generates a "settings$py.class" when it compiles it, and manage.py loads this thinking it's settings.py. You can delete settings$py.class each time before you attempt to use manage.py, or apply this patch by Frank Wierzbicki.




I'm writing this entirely from memory, so I almost certainly have forgotten some steps. Please mention any trouble you have via comments to this blog post so I can update my post accordingly.

Wednesday, June 11, 2008

How Many Pages Per Mile Can You Read?

I can read about 20.

What the heck am I talking about? My new love: reading while walking! I have a hard time focusing on one book for more than a few pages while sitting still and reading before the old ADHD kicks in. It doesn't matter how interested I am in the book, fiction or non-fiction, I always need to stop reading to act upon some thought or overwhelming impulse that I've just had.

So today I thought I'd try something new. I had just gotten Your Brain: The Missing Manual (see my previous post). I was about to go for my daily walk, but I really really wanted to read this. So I thought why not, I'll bring the book with me and read it while walking. Worst case, that proves to be difficult and I just carry the book with me on my walk.

What a pleasant surprise! I found that I not only walked my usual, very brisk pace, but that I actually managed to read 64 pages; 3 whole chapters! Reading 64 pages at home would be nearly impossible for me. I actually retained all of the material and had a very enjoyable walk. When I got back home, I decided to do a useless calculation:

First I used Google Maps to map out my route. I had guesstimated it was 3 miles long, but it turns out it's 3.2 miles. Feel free to stalk me on my daily walks if you know where this is.

View Larger Map

64 pages over 3.2 mi = 64/3.2 = 20 pages per mile. A completely useless statistic, but fun to know.

So why this blog post? I feel like this is a mind hack. I typically can't focus enough to read a lot at once, so this seems like a great way to get me reading on the daily walks that I enjoy so much. I'd love to hear results from anyone else who has tried this or would like to! Apparently, there's even a wikiHow article on the subject.

Friday, June 6, 2008

Romance and JavaScript

I just received a book order from O'Reilly Media. It didn't quite go as planned...
Here is the email I sent their customer service department.

I daresay there was a problem with the book order I just received. I had ordered CSS Pocket Reference (9780596515058), HTML and XHTML Pocket Reference (9780596527273), and JavaScript Pocket Reference (9780596004118). I received the CSS and HTML Pocket References, however, instead of the JavaScript Pocket Reference, I received "On a Wild Night", a paperback romance novel by Stephanie Laurens. Needless to say, this is not what I want. While I appreciate O'Reilly Media trying to add some excitement and romantic zest to my life, I'm far more in need of exciting JavaScript flings at the moment. I can't even imagine how this book was placed in my order, since O'Reilly Media focuses on technical computer books, and doesn't even sell this book. Perhaps Ms. Laurens is not selling as many books as she would like, and is subverting O'Reilly's shipping department by slipping in copies of her books to unsuspecting customers, in the hopes they will enjoy it and buy more?

In any case, please send the JavaScript Pocket Reference, as that would be absolutely wonderful. Again, the order number is 171212.11860740.

Yours in romance,
Matthew Harrison



EDIT: This was replied to by a very nice customer service rep named Tammie. She said I made her Friday and she loves my humor =) She's going to send me the JavaScript book, and in addition, said I could pick ANY O'Reilly book and she would send that as well! So I picked Mind Performance Hacks, and as an added, super-awesome bonus, Tammie said she included an additional surprise book for me! (It's this book).

Talk about caring about your customers! O'Reilly not only has an awesome collection of books that any geek desires to have, but they clearly have really good customer support. O'Reilly++

Wednesday, May 28, 2008

GSoC 2008

It's very, very difficult to believe that a year has gone by since I started this blog, and it's already time for Google Summer of Code 2008.

I see I haven't made a post since last August. It seems that whenever I try to blog, I inevitably run into the dilemma of being busy enough to have tons of stuff to blog about, but being too busy to actually blog about it. Hopefully I'll get a good run before I end up doing that again. Here is a very brief summary of what I've been up to since last summer:

* I'm no longer working on my project for OpenMRS. It's moving on in new and exciting directions and I believe there's some GSoC work being done on it this year. I plan to work on their documentation restructuring project with Michelle Murrain, which will hopefully be a project that I continue with well after the summer.

* I have an iPhone! It's fantastic. I've never owned a cell phone before, but this is so much more than a phone. I immediately jailbroke it and installed many extraordinarily useful applications. I don't know what I'd do without this - I've gotten so much utility out of it. It's great having Internet everywhere I am, even if it is EDGE. Now that the iPhone SDK is out, I really hope I'll have some time to hack around and make my own apps.

* I got an XO! I asked for the give one get one program for Christmas. Unfortunately, I haven't had much time to dedicate to hacking on it, but I have some ideas.

* I took "Intro to Computation" in school, which is basically discrete math. I learned a lot of really cool things, and next semester I'm taking Intro to Algorithms, Intro to Software Engineering, and Programming Language Paradigms. Good stuff.

* My ThinkPad died =( Something weird happened - it would cycle from battery to AC adapter rapidly, and wouldn't hold a charge. One day the plastic of the adapter melted within the plug, and I can't charge or use it. Otherwise, it's fully functional, and I was thinking of getting a dock and charger to see if I could use it like that.

* I replaced the ThinkPad with...a Blackbook! Wonderful machine. The built-in resolution is a bit small, but I have it hooked up to my 19" LCD and am using a dual-monitor setup so I have lots of screen real estate. Good stuff.

* I have a couple of awesome new lenses for my camera, but that's a story for another blog =)

This year, I've been accepted to GSoC to work with the Software Freedom Conservancy on a project to modernize their web annotation system. Proposal here. I'll be working with JavaScript and jQuery to write the system, so I've been reading tutorials for both and playing around. jQuery looks really, really cool. I'm excited. I'm working with Joshua Gay, an active member of the free software community. I've actually known him for years now - we met through a mutual friend and he used to go to the same school as me. I anticipate having a really good working relationship with him and getting some good stuff done.

That's all for now. Check back for updates!

Wednesday, August 15, 2007

Capturing the Session ID

I'm out here in Boston visiting the PIH offices and meeting all of the OpenMRS guys from the Boston and Indianapolis offices (more on that later).

Burke and I have been trying to figure out why the forms in OpenOffice.org aren't actually HTTP POSTing the XML data to the OpenMRS server. After scratching our heads and checking the server logs, he suggested I use tcpmon to capture the data from OpenOffice to see what was going wrong.

tcpmon is pretty cool. It works by creating a sort of proxy between your localhost and another server. You give tcpmon a local port and a server and port to connect to, and it relays all of the information sent to localhost:port to server:port, capturing all of the data in between. I tried it out with my OpenMRS installation at home, and here's the result of submitting a form:
HTTP/1.1 302 Moved Temporarily
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=AFD3B8B163899870591929C471BEE8B7; Path=/openmrs
Location: http://localhost:8084/openmrs/logout
Content-Length: 0
Date: Wed, 15 Aug 2007 13:46:45 GMT

HTTP/1.1 405 HTTP method POST is not supported by this URL
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=104147D9161F0663F217CC856D2056FE; Path=/openmrs
Content-Type: text/html;charset=utf-8
Content-Length: 1115
Date: Wed, 15 Aug 2007 13:46:45 GMT

HTTP Status 405 - HTTP method POST is not supported by this URL


type Status report

message HTTP method POST is not supported by this URL

description The specified HTTP method is not allowed for the requested resource (HTTP method POST is not supported by this URL).


Apache Tomcat/6.0.13




This seems to indicate that there's a problem with authentication. I'm given a JSESSIONID and immediately sent to the logout page. I'm also given an HTTP error 405.
The forms in InfoPath use the taskpane combined with some JavaScript trickery to capture the JSESSIONID for an authenticated session so the form is free to communicate with the server and the server thinks this is just a regular Internet Explorer session. I'm thinking that if I can somehow capture this JSESSIONID in OpenOffice, I can use it the same way that InfoPath does. How I'm going to do this, I don't know. It may finally be time to crack into my Macro book and see if I can do some similar HTTP trickery. When I return home and have access to InfoPath, I'll sic tcpmon on it and see what's happening.

Wednesday, August 8, 2007

An OpenOffice.org Annoyance

While working on a form in OpenOffice.org, I noticed something quite annoying.
When you create a form element and bind it to the XML, you can choose what data type you want; date, decimal, string, boolean, etc. On certain data types, such as a decimal, you can define a minimum and maximum range for the data entered:


One would expect you to be able to define a separate range for each different control with type Decimal, right? Wrong. Apparently, if you define a range for ANY control that's type Decimal, it applies to ALL controls of type Decimal. I can't imagine why this is implemented as such because I don't see why you'd want the same range for all different controls. Needless to say, it's problematic.

I'm trying to find the best way to limit the range of values entered for controls such as obs/pulse/value in the OpenMRS forms. In InfoPath, this control (and others like it) have a custom data type, pulse_type_restricted_type, limited to a range of 0-230, inclusive. The entry from the XML schema looks like this:
<xs:simpleType name="pulse_type_restricted_type">
<xs:restriction base="xs:int">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="230"/>
</xs:restriction>
</xs:simpleType>

OpenOffice.org doesn't seem to have this feature. While I can add a custom data type named pulse_type_restricted_type, there doesn't appear to be a way to define a valid range for it. There's an option to add a "Constraint", but I'm not sure how this works. As you can see here, the help files on this are somewhat less than useful:


I'll post some questions to mailing lists and update this post when I've found a solution.

Monday, July 23, 2007

FindBugs

As I was perusing the Google Code site, I came across a really cool tool called FindBugs™. FindBugs™ is an open source Java debugging tool that scans through all of your Java class and source files to find programmer-error bugs. From their fact sheet:

"FindBugs looks for bugs in Java programs. It is based on the concept of bug patterns. A bug pattern is a code idiom that is often an error. Bug patterns arise for a variety of reasons:

* Difficult language features
* Misunderstood API methods
* Misunderstood invariants when code is modified during maintenance
* Garden variety mistakes: typos, use of the wrong boolean operator

FindBugs uses static analysis to inspect Java bytecode for occurrences of bug patterns. Static analysis means that FindBugs can find bugs by simply inspecting a program's code: executing the program is not necessary. This makes FindBugs very easy to use: in general, you should be able to use it to look for bugs in your code within a few minutes of downloading it. FindBugs works by analyzing Java bytecode (compiled class files), so you don't even need the program's source code to use it. Because its analysis is sometimes imprecise, FindBugs can report false warnings, which are warnings that do not indicate real errors. In practice, the rate of false warnings reported by FindBugs is less than 50%."


You can use FindBugs as a standalone Java WebStart application, or as a (very handy) plugin to Eclipse. Very cool. I wanted to test this out on something, and what better source than the OpenMRS codebase? I grabbed the current alpha branch and tested both the WebStart application and the Eclipse plugin. The Eclipse plugin found 985 bugs and proved itself to be quite useful.

Here are instructions for using the FindBugs Eclipse plugin on the OpenMRS codebase (although this will work for any Eclipse project).

Install the FindBugs Plugin


1. Fire up Eclipse. Go to Help > Software Updates > Find and Install.
2. Select "Search for new features to install" and click Next.
3. Click "New Remote Site" and enter "FindBugs update site" for the Name and http://findbugs.cs.umd.edu/eclipse/ for the URL and click OK:

4. Make sure "FindBugs update site" is checked and click Finish.
5. Expand the trees in the next window and check "FindBugs Feature" and click Next.
6. Accept the license and click Next.
7. Click Finish, follow any prompts that may come up, and restart Eclipse when prompted.

Using FindBugs on Your Project


1. If the Java files you want to debug aren't already added to Eclipse, create a new project via File > New > Project.
2. Once your project is in Eclipse, right click on the project name and go to Find Bugs > Find Bugs:

3. Let FindBugs work its magic. When it's done, all of the bugs found will be listed as Warnings in Eclipse's Problems view.
4. Opening FindBugs' view is much more useful than the Problem view. Go to Window > Show View > Other, then expand the FindBugs tree and select Bug Tree View:

5. This creates a nifty little view with all of the bugs found, categorized by bug type:

6. And now the really cool part: Click on an individual bug, and it will jump to that line of source where you can fix your code:

7. Impress the world by debugging all of your code, and then debug all of the world's as well.


I hope this is useful!

Saturday, July 14, 2007

Mid-term Report

It's hard to believe, but it's already time for the middle-of-the-term progress report for Summer of Code, and that applies to us OpenMRS interns as well. So, here goes.

I'm back from Florida (pictures at http://pics.discosoup.net/FL.html for anyone interested) and back to work. The question I've been asking myself this past week is "How can I make things simpler?" in terms of form creation in OpenOffice.org. OpenOffice's XForms component is much less polished than Microsoft InfoPath so it requires a few more manual steps that the user needs to do to create forms. While the steps aren't difficult, they're tedious and time consuming and I felt that they seemed daunting to OpenMRS implementors and developers that may be converting forms to OOo and I wanted to see if I could cut them out.

Data Types


The first tedious step involves manually entering the data type for each XML element in the form when you create a binding. While InfoPath reads in the XML schema directly and does this for you, OOo does not. Users are required to specify the data type for each binding, and this requires either looking at the schema or at InfoPath to find out if a control should be a simpleType, simpleContent, Integer (decimal in OOo), _infopath_boolean, etc.

My first thought was that I could either write a Python script or some sort of XSLT trickery to manipulate the XML of the .ODT files to associate entries in the form with the proper data type from the schema (.odt files contain a content.xml which has both the styles that define the appearance of the document and the XML template + bindings), but this was too complicated. I looked at the output of submitting an OOo form to an XML file and noticed that none of the data type information (simpleContent, simpleType, etc) was saved. I compared this with the output from an InfoPath form, and it was the same there as well - no data type information saved. So I asked my mentor about this and we came to the conclusion that it's only necessary for data validation in the form itself (user input) and doesn't need to be kept track of in the bindings. While it's a good idea to make sure certain fields are restricted to integers or dates, it's not necessary for OpenOffice to keep track of whether checkboxes or radio buttons are simpleContent or simpleType - the output will be the same. This successfully cuts a lot of tedium from the form creation process.


Binding Names


Another tedious part of form creation is having to rename each binding after you create it in OOo. Creating new bindings results in bindings called "Binding 1", "Binding 2", etc. These names are somewhat less than descriptive, and I had been renaming them to their XPath expression equivalent - so "Binding 1" becomes "patient/patient.sex", etc. This is functionally irrelevant mind you - OOo doesn't care what it's named, it's just for user reference. Still, I prefer to have meaningful names so they can be easily referred to later.

I opened content.xml in jEdit to see how binding names were handled, and here's what I found:

<xforms:bind nodeset='patient/patient.sex' type='simpleContent' id='Binding 1'/>
is an example of a binding with the default name. This is later referenced in the definition of a control style:

<form:radio form:control-implementation='ooo:com.sun.star.form.component.RadioButton' form:label='M' form:name='nameOfRadioButton' form:image-position='center' form:id='control19' form:value='M' xforms:bind='Binding 1'>
It should be fairly easy to write a script that replaces every instance of 'Binding X' with the "nodeset" value, so that cuts this step out of form creation in OOo as well. All that's left now is to drag elements from the XML tree to the document, change the control type and label/properties, then position them to fit where you want them to - pretty easy stuff.


template.xml and FormEntry.xsd extraction


Each OpenMRS InfoPath form contains two important files: template.xml and FormEntry.xsd. Template.xml has the actual XML data that's edited by InfoPath. When a form is published, form data entered in the UI is written to the appropriate XML tag in this file. FormEntry.xsd is InfoPath's version of XML schema (one of the reasons OOo doesn't read this in - it's not quite a proper schema, but MS's own version). This contains all of the data type values for each element as well as valid HL7 values.

It's necessary to get both of these files for the form you want to convert to OOo. InfoPath's .xsn files are CAB files, so you need a CAB extractor to get at them. I've found that WinRAR works for Windows, Unarchiver on OS X, and cabextract on Linux. You can find links to these on my wiki instructions, and I'll soon be modifying the instructions to include a section on extracting these files.


What's Next


After I finish the wiki instructions on form creating in OOo, there are two big tasks to tackle: loading patient data from the webapp and recreating InfoPath's taskpane.

I've been thinking about loading patient data a lot these past few days. Currently, when a user loads a patient's form from the webapp, it sends an .infopathxml file which is a filled-out version of template.xml with instructions at the top so InfoPath knows which form to open. Paul is very adamant that OOo also has this functionality so it fits in with the FormEntry workflow just like InfoPath. I'm confident it can be done, and I've been thinking about several solutions:
As mentioned in a previous post, you can add local or remote instance data to a form in OOo. This can load patient data into your form, but it involves manually clicking "Add instance data" and selection the XML file containing patient data. Doable, but not at all an elegant, easy solution. I thought perhaps I could write a macro that automatically does this, but that would require users to have a local copy of the form they wish to do this with, and that's not very good either (currently forms are stored on a server and InfoPath opens them remotely).

After a discussion with Justin Miranda on the IRC, an idea popped into my head. Since OOo's .odt files are just zip files containing several XML files, and since all template data is saved in content.xml, why not have a script on the server that grabs the appropriate form, replaces the blank template data in content.xml with the filled-out .infopathxml files containing patient data, and send this to the user? Should be quite doable in Python, and from the user's perspective, the only thing that changes is that OOo opens instead of InfoPath. OOo can also open files from remote URLs, making this even easier.

Justin also mentioned the idea of having one form for everything and one XML file per patient. That way, every patient would have one form and one XML file. While this would simplify things IMMENSELY, it would require a lot of refactoring of the way OpenMRS works, and you could run into problems caused by adding/removing fields from the forms and would have several different versions of the form running around. Also, both the form and XML would be quite large, and this could be an issue for users who only need to edit a small section of data.

As far as InfoPath's taskpane is concerned, I'm not quite there yet. The taskpane is a tab in InfoPath that executes some javascript to pull in remote data via http. This allows users to select information (such as the location of a patient encounter) from a remote URL, all from within InfoPath. While all of this can be accessed via a regular web browser, it's convenient to have everything in one application. The quickest and dirtiest solution would be to connect macros to buttons that open the target URL in the user's $browser, but this loses the "all-in-one-place" appeal. Other options include using small web browsers (via a script) in a floating window in OOo and/or pestering the OOo developers to add this functionality. I welcome any suggestions on this issue.

I hope my progress thus far has been sufficient (I've gotten pretty good feedback from both OpenMRS and OpenOffice.org, so that's good), and I look forwarding to continuing for the rest of the summer and hopefully beyond as well. I truly enjoy doing this.


The OpenOffice.org Community


In order to work on this project, I've had to involve myself in the OpenOffice.org community. I've been reading and posting to the Users mailing list as well as the XML developer's list and hanging out on the IRC. This has all been immensely helpful and rewarding to me because I'm learning about OpenOffice by reading and answering other users' questions. You learn a lot about a subject when you're helping others because you're forced to really get in there and figure out how something works.

Andrew Pitonyak, author of the previously mentioned book, "OpenOffice.org Macros Explained", has noticed my work with XForms on my blog and the OpenMRS wiki and dealt some flattery my way by suggesting that I write the XForms documentation for the OpenOffice manual at OOoAuthors. I look forwarding to contributing to this as much as I can. There's quite the lack of OOo XForms documentation out there, and I'd very much like to share what I've learned with others.


All in all, this has been a very exciting and enjoyable summer, and I look forward to what lies (lays?) ahead.

Monday, July 2, 2007

Weekly Summary

I'm headed to the airport in just a few minutes to catch a flight to Florida, so I thought I'd post a quick summary of what I've done this past week to catch everyone up.

In addition to further developing the Adult Initial Verification Form in OpenOffice, I've been posting to the OOo XML developer's list and discovering some new things. The coolest is OO's ability to link to local or remote instance data that will fill in a form with some saved values. From the mailing list:
Yes, initializing an instance from a remote location is possible (given
that that location is reachable).

The dialog that you mention has a "link" check-box. If you link the
instance, it will be retrieved every time the document is loaded. If you
leave it unchecked, the external instance will be copied into the
document and stored there. In case of a document internal instance,
changes that a user makes to the instance data via bound form controls
will be persisted with the document when it is saved.

When you bind a form control to a node in the data navigator, the
pre-filling that you mention should work. The easist way of creating
such a binding is to drag the node from the navigator onto the document...

Bests,
Lars

mharriso@student.umass.edu wrote:
> I've tried OO Users and Writer Users list with this question, and both directed
> me here. Hopefully someone can help me...
>
> Does anyone know if OO is capable of getting XML forms instance data from a
> remote source? MS InfoPath is capable of loading XML data into a form and the
> forms are filled out with pre-defined values. In OO, I've noticed there is the
> option to add instance data, and the window that pops up asks for a name and a
> URL, which I take to mean remote or local. There's a browse button for local
> data, and it gets added to the data navigator, but I can't seem to get it to
> pre-fill the forms.
>
> Example: In my schema, I have . I want to be able to
> load up a file containing Matthew Harrison and
> have the "Name" field on my form filled out with that value.
>
> -Matthew Harrison


I've tested this out, and sure enough it works. Now I need to figure out how to automate the task of loading the instance data. It's doable, I just haven't quite worked out how.

I've also started writing "Creating Forms in OpenOffice" on the OpenMRS wiki and also created a link to it from the project page. I hope to get this finished up by this weekend.

I'll be back on Friday the 6th, and until then I can be reached via email if needed.

Sunday, July 1, 2007

Font Rant 2

Just another small font rant.
Can you believe these are both Times New Roman, 9 point, same size paper, 1680x1050 screen resolution? I can't.

On WinXP:


On OS X (X11):

Sunday, June 17, 2007

A Tale of Four Fonts

This post is a rant.

This is driving me nuts. I'm struggling to find a font in OpenOffice that is consistent size across Windows, Linux, and OS X. Not only do they not all have the same selection of fonts (they don't even all have Helvetica or Georgia!), but the same font appears differently on all of them. Here are some screenshots using Times New Roman, bold, 11 point.

1. Windows

2. Linux

3. OS X using the X11 OpenOffice

4. OS X using NeoOffice

Saturday, June 16, 2007

OpenOffice XForms and Bindings

Good news! In my last post, I complained about OpenOffice's lack of control when dragging elements from the data navigator to your document window to create bindings automatically. It seemed that you were limited to a LabelField and TextBox, and you could only replace the TextBox with a limited number of controls.

Further research has shown me that these two controls are actually grouped together - hence the lack of control over them. I found that you can ungroup them and you have two separate controls, and can then replace the TextBox with whatever you want.

Here's an example of how to do this in OpenOffice. I'll use the "patient.birthday" which we'll need to change to a date field. (Note that the following screenshots are from an awesome program called NeoOffice, an OS X port of OpenOffice that currently uses version 2.1. I've been trying it out, and so far it seems really really cool.)
1. Drag an element from the data navigator to your document. It will look like this:

2. Ungroup the controls. This can be done via Format | Group | Ungroup, or by adding the Ungroup button to your toolbar:

3. You can then click on the individual controls and edit them. Here we'll replace the TextBox with a date field:

4. You can make a pretty calendar-like date select field by setting the "Dropdown" property to "Yes" (thanks to omarc55 for this tip!):

5. All of your XML data from the schema should be there:

6. You can now also change the LabelField to read something more user-friendly like "Date of Birth:".

I'm pleased as punch to have found this. It should make creating this forms MUCH easier. It also opens up form creation to the less technically-inclined people out there.

Friday, June 15, 2007

Form design in OpenOffice

(Note: After you've read this, see my follow-up post.)

This past week I've been working with InfoPath and OpenOffice.org side-by-side to compare what InfoPath can do vs. what OpenOffice can do. I've started creating the Adult Initial Encounter form in OpenOffice, and so far I've been quite pleased with the results (as have the rest of the OpenMRS gang). Here's a screenshot of part of the form:


I'm noticing that while InfoPath has a lot of drag-and-drop functionality, OpenOffice leaves much of it for the user to do. For example, in order to create a binding in InfoPath, you simply drag an element from the schema over to the document. You can then change what type of control it is (text field, radio button, date field, etc) and it will be nice and bound to your schema:

In OpenOffice, this isn't quite as easy. While you can drag an element from the data navigator directly to the document, you don't seem to have much control over what type of form control gets added. As you can see from the picture, it defaults to a simple text field with a label that I can't seem to change. While it allows you to replace the control with other controls, your choices are very limited. Hopefully this will be further developed in future versions:

In order to bind everything in OpenOffice, I've been selecting the binding from a drop-down list in the data tab of the control properties window. While this is perfectly doable, it's a bit tedious. I've posed a question to the OpenOffice Users mailing list about this, so hopefully I'll have an answer soon.

If there's no better way to do this in the GUI, it's possible that I may be able to write a script that edits the XML files in the .odt file that OpenOffice generates that fills in all of the bindings and default values. Both the InfoPath .xsn files and OpenOffice .odt files are zip files that contain XML and style sheets defining how the forms look and behave.

When the forms are submitted, they send an XML file to a remote server which then has some XSLT which translates the data to HL7. Here's an example of what this looks like, taken from the first screenshot, question 11a, "Is the patient or their partner currently using any form of family planning?":

If the user of the form checks the "Oral Contraceptive Pills" box, the <oral_contraception/> element is set to true, and 780^ORAL CONTRACEPTION^99DCT is the HL7 that is submitted.

I'm currently working on the rest of this form, and I'll start work on the others once it's finished. While the Adult Initial Visit form is rather large and tedious, I figure it's a good place to start because it will have a lot of reusable elements that I can copy and paste into other forms.

I'm pretty pleased with progress so far - I think things are going along quite nicely. Comments?

Monday, June 4, 2007

Week One - Getting Started

After meeting with my mentor and organization admin, my task for the first week has been laid out for me. I'm going to walk through the process of creating and using a form in InfoPath, and document how it works with OpenMRS. This serves three purposes: first it will allow me to understand this process and learn how everything works together. Second, it will show me what needs to be done in OpenOffice, and I'll find out what OOo can already do vs. what I need to make it do. Third, it will allow me to write documentation for this on the wiki for the benefit of other developers and the implementors. Keep an eye here for the documentation as I write it.

Note: I'll write more on this post when I return home to where the stuff I already have done is.

Thursday, May 31, 2007

Summer of Code as a Learning Experience

As I watched Summer of Code students discussing their projects on the IRC channel, something occurred to me: while SoC is a fantastic learning experience, there isn't much student-to-student learning. Student coders can learn from their projects, mentors, books, Google, etc, but not as much from each other.

I'd love to see more explanation of what other students are doing. Many of the students are light years ahead of me in terms of coding abilities, and I think there's a lot I could learn from them. There are so many projects working with concepts I don't understand and I would like to.

People seem to be blogging a lot more this year than they did last year, and that's awesome. The Planet SoC feed seems to be busier this year as well. We should use this increase in blog popularity to teach each other about what we're doing. Not just saying what our project deals with, but taking the time to explain concepts and maybe even mini tutorials - teaching in general.

Anyone else with me?

XForms/OpenOffice Resources

I'm going to use this post to collect useful resources that I find regarding XForms and OpenOffice (and general OpenMRS info). I'll update it as I find more, and I encourage anyone that has anything to add to leave comments.

Using XForms in Office Applications
The OpenOffice.org XML Project
OpenOffice.org module xforms
Wordpress search for "XForms" tag
W3C's XForms Specification
OpenMRS - Administering FormEntry
OpenMRS - InfoPath Notes
OpenMRS Data Model
GUI Architectures (useful MVC info)
OpenMRS - HL7
Office 2003 XML Reference Schemas