Monday, 6 May 2013

Exporting a digital scholarly edition

One of the most often requested features in a digital scholarly edition service is the ability to export data produced within it. This is a more complex problem than importing, which typically happens in stages: one adds some source texts, then marks them up, then adds annotations, images and formats to change its appearance and function over time. Getting all that data back out again into one archive that can be transported to a new site, or transferred to a different program can be a bit messy. Also adopting a radically new way to store texts for editing, viewing and comparing makes it harder to just "export" the stuff.

So to overcome this I have added a PDEF (portable digital edition format) export function to calliope. This queries the server for all the files, markup views, annotations, images and formats, and downloads them all in a single .zip or .tar.gz archive. At the moment this is just a service in Calliope: the user invokes the calliope service e.g. in curl:

curl -X GET http://localhost:8080/pdef/?DOC_ID=english%2Fshakespeare%2Fkinglear.*\&FORMAT=TEXT\&FORMAT=MVD -o archive.tar.gz

This downloads a file, whose structure is:

This archive can then be uploaded using the mmpupload tool (or will be when I have modified it) to a new repository, effectively installing the DSE in a new location. All the user has to provide is a wildcard document identifier -- in this case "english/shakespeare/kinglear.*" and it will download all the files with that docid prefix. (Because this is a URL the "/"s have to be escaped as "%2F".) Upload supports two other formats: XML and MIXED. MIXED allows text and XML files to be freely intermingled.