Thursday 19 December 2013

Distributed Digital Scholarly Editions

A certain scholar in the field has been talking about distributed DSEs for years. I make no claim as to the originality of the basic idea, but the implementation is a custom extension to AustESE. The basic problem is this. When you create a DSE you rarely have ALL the data in one place. If I give you a DSE of Charles Harpur, do I include all the images of all of his manuscripts? That would be several gigabytes. You probably wouldn't like that. But let's say I gave you 3 megabytes instead, being all the text and commentary only. The images could just be stored via links or pointers to the real data. This is the case with manuscripts in Europe now, because they are all stored on Europaeana.

The specific situation I am thinking of is a biography of an author that is stored on a different server, and making a copy of it is simply out of the question, because it is subject to constant editing and we want just one copy to exist. But what if on MY server I stored a link that pointed to it. I could treat the file just the same as the real resources that reside there. In other words I could have a virtual file system that covered both real files, such as "english/harpur/The nevers of poetry", which would contain an MVD and the text of the poem, but also a file called "english/harpur/biography", which would be a LINK to "http://austese.net/sites/all/modules/austese_repository/api/events/", with a parameter of project=21 and a username and password needed to get it. But if I put all that information into the LINK then I could treat the LINK as if IT was the biography and forget about the details of how to fetch it.

I realise this sounds a lot like handles - intermediary URLS like doi://, or like a file system on disk with soft and hard links. The difference is that it is implemented as a virtual file system as part of a digital scholarly edition. The advantage is then that I can annotate those virtual files even if their address changes, and I can have all the files that make up the 'distributed' edition in one place - the real and the virtual. The real advantage is then that you can treat local and remote files as if they were the same thing for the purposes of editing, updating, annotating, reading, etc.