Monday, July 11, 2011

RDF in ODF: Abiword & Calligra

RDF has been slowly making it's way into Office applications. The ODF standard includes support for shipping RDF/XML file(s) inside the zip file that is an odt file. This RDF can also be linked to particular part(s) of the document text so that you and your computer both know where the RDF is most relevant. For example, if "Fred" in the document has his phone number, location, and cake preference in RDF, that can all be linked just to the four characters "Fred" so that it all makes sense. Strange as it might be, not everybody likes Baumkuchen, and it is fairly likely not to be relevant to a stock quote in another part of the document.

RDF has spread to OpenOffice, abiword, KOffice, and Calligra. All of these applications can read and write RDF in text documents. The later two also include a GUI to allow you to query, inspect, and update the RDF. Since I'm hacking on abiword, I've been throwing around how to best expose RDF to the person using abiword for document editing...

First, this is what Calligra does. The main document window includes an RDF docker which shows you the high level "Semantic objects". These are things which make use of many RDF triples to present a single object type such as a contact, calendar event, location, or explicit train trip. Note that the RDF docker only shows you the semantic objects for the RDF which is relevant to the current document cursor position.

The Document Information window also lets you get at all of the RDF which ships with an ODF file. The Semantic tab is very similar to the RDF docker but shows all the Semantic Objects regardless of where they are relevant in the document (if at all). As you can see below, editing the "Dan" person semantic object you can set their name, nickname, phone number, and homepage. Of course, more information is relevant to people and this whole section should be expanded to cater for that. And yes, for Calligra having a good hookup to Akonadi would be of great use for all.

Contacts use the FOAF RDF schema in Calligra. This allows not only contact information but also the relations between contacts to be expressed. FOAF is about Friends of a Friend after all. Looking at the above you might think name, phone etc are each going to be a triple in the RDF from the document. The triples tab lets you get at that lower level RDF goodness as shown below. A few things to note; while RDF is triples, each object has a type (is it a chunk of text or a link to another subject), and since there are many possible "files" the RDF/XML came from that is tracked for each triple so they can go back there too on save.

Notice in the above that prefixes are included in the subject, predicate, and object columns. This is an attempt to make the raw RDF less verbose and somewhat simpler to handle. The namespaces tab lets you set these up. Any namespaces that are used in the RDF/XML from the ODF file are automatically added and used for you.

The stylesheets tab I'll cover at another time. The SPARQL tab lets you run a query against all the RDF for the document. The one I've run here is the default one that Calligra shows you, which will select all the triples from the document without restriction. The subject, predicate, and object resulting are shown in the bottom half of the window.

I was thinking about all of this recently because I'm now looking to add GUI stuff to abiword to allow RDF interaction. The first idea was to simply add a "edit RDF" context menu item to allow you to associate one or more triples with the cursor position or current selection. The ability to define and reuse namespaces would also help to make such a dialog less painful to use. This brings the design close to the combined "Triples" and "Namespaces" tabs of the Calligra Document Information window. This might be OK for determined users who already really, really know they want to do these things. But I tend to think there are more folks who could take advantage of using RDF but not necessarily care about it.

Simplicity for users was the driving force behind the design of Semantic Objects and the use of Drag and Drop to and from other applications to create and harvest RDF data. I think it is much simpler to grab the "Fred" contact from Evolution and drop it into the document than to work out that you want to use FOAF and the exact predicates to create a well defined RDF graph for the Fred contact and then copy and paste each of those pieces of data individually.

One might like to consider the Triples+Namespaces as a special type of Semantic Object, a "raw" object if you will. This brings together the design of the advanced and user friendly interaction into a single dialog. As the namespaces are likely to have whole document scope they can be setup and edited elsewhere. Unfortunately I had a bit of trouble working out how to populate a tree or list in glade-2 or glade-3 for mock ups, so these are gimped a bit too.

The dialog below is a semantic object editor with the advanced tab allowing raw interaction. As there can be zero or more semantic objects of a given type in scope at any point there is a list on the left side allowing you to choose which object of a type to view. Perhaps that should be a drop down list at the top of the tab to save screen space.

The email and VoIP links should start a new message or request a phone call with the person respectively. Such actions should also be available without getting to the editor itself. My current plan is to have the advanced tab allow interaction with the raw triples. Remember though that triples carry type information, possibly extra context, and/or perhaps a range of the revisions in a change tracked document that the triple is valid for. So its by no means just a list with three columns as the name triples might at first imply.

A somewhat problematic first blush at this gives the below. I'm thinking that the subj, pred, and object strings can be namespace:foo strings, possibly with some completion for known namespaces like foaf, et al. The type is fairly OK as these are fixed and mainly URI or Object.

The revision range selection is a real challenge. This might become some sort of date range bar line the timeline or timeplot from the simile widgets. The trick as is usual is extrapolating the extra dimension from what is in it's vanilla sense a linear one dimensional data set ( time, revision ). Though having the revisions and their descriptions in the top half of the timeline and the ability to pan and zoom seeing a density plot in the lower half would work for starters.

I'm thinking that as well as showing you all the triples that maybe allowing simple one or two line SPARQL to be run to find the triples to edit would be preferable. Perhaps it doesn't add much for a small document range with only 20 triples associated, but to use the dialog on the whole document too, you might want to limit triples to "current revision" and foaf related only for example. Using a triple list allows you to sort by column and search, but such a search could also be performed with relatively simple SPARQL. And normally, and extremely unfortunately, one normally doesn't get to stable sort lists by 2+ columns. A limitation I try to avoid inflicting.

So in summary, raw triple editing can be just an advanced semantic object. The list of semantic objects should be able to be found from a document position (cursor) or arbitrary begin-end range. The later catering for whole document RDF editing as a special case. For contacts there might be one or more semantic objects for any doc position or range, but there will only be one raw-triple semantic object for any range.

Though I'm still chucking around how to make the query/edit part most convenient for users for the raw triples semantic object.


damian said...

1 semi-unrelated question, are those toolbar fonts default in the next version of calligra words?, are they changeable?

monkeyiq said...

No that is not the default font, and yes you can still change it.