Saturday 8 December 2018

The trouble with DIVs

A <div> is a division in a HTML or XML document that can be nested. It is typically used to provide a higher-level structure to a document that would otherwise be just a succession of paragraphs and characters-ranges. But here is the key point: word-processors don't use DIVs. Everyone has used a word-processor like Word, and we don't notice any restrictions on what you can do in those programs. Quite the contrary: Usually there are too many formatting options to choose from. So what are DIVs exactly good for, and can we get rid of them altogether?

DIVs would appear to be useful for two main reasons:

  1. They provide a logical organisation of an otherwise complexly marked-up document that allows the encoder to apply a divide-and-conquer strategy to getting the job done. Each DIV corresponds to some kind of logical unit: a section of notes added to a poem, a prologue, a title page etc.
  2. They can be used to provide extra formatting to sections of a document. We might want to add extra white space at the end of a poem and the start of its notes. So we can attach that white space to the DIVs in question.

The first point is a way to overcome inherent complexity of markup, of which DIVs form a part. They are thus self-justifying. In a WYSIWYG editing environment DIVs are not only unnecessary but greatly increase the complexity of the user interface. That is bad news for documents that need to be created online in a crowd-sourcing scenario.

The second use of DIVs can be met in CSS by just adding extra space or special formats to the last or first instance of a class of paragraph.

So neither requirement adds any significant functionality to editing itself, and DIVs would thus appear to be entirely dispensable. Now this runs counter to what all XML-afficionados keep telling me that plain text doesn't have enough expressibility. They use it as a justification for complex XML, but in our textual model variants and other alternatives – the main reason for complex markup – are expressed through layers and versions, leaving each version of a document (a draft, a stage in its correction etc.) quite simple already. If we can also dispense with DIVs that means that at least 95% of the complexity of XML can be dispensed with, and hence we do not need XML at all.

A different model of text

HTML uses a textual model built on a weak hierarchy of DIVs, paragraphs and characters. For cultural heritage (historical) texts what you actually need is a hierarchy of paragraphs, lines and character-ranges. And this applies not only in poetry, which is structured around lines and stanzas, but also in prose where transcriptions of historical texts should preserve the line-breaks of the originals. This greatly facilitates transcription and checking, and also allows a digital reconstruction of the textual content.

In our WYSIWYG web-editor we use single linebreaks to indicate a new line and double line-breaks to indicate a new paragraph. There are no DIVs. Formats can be applied to paragraphs, lines and character ranges.

Ecdosis WYSIWYG editor

On the top left there is a dropdown list of documents to edit. Below that is a list of applicable formats divided into the three categories already mentoned. Next to that is the version menu, by which one can select a version of the document to edit. (There might be several, e.g. a newspaper printing of a poem, a manuscript and a book version etc.). On the right hand side there is a layer tab. This represents the final state of the text. Other earlier states of the same version can be created by clicking the plus-tab. Layers are edited individually. The save button saves one layer of one version of one document. Publish combines all the versions and layers into one document for viewing on the web. I am building a sandbox tutorial site where I am putting up examples for training in the Ecdosis editing system.