Tuesday 26 July 2016

Sync-scrolling images and text when editing transcriptions

When transcribing original documents it is helpful to have the page image next to the transcription as it is being written. That way the transcriber can see what the next word to transcribe is, or quickly check for mistakes in some one else's transcription. Most documents contain more than one page, so this gives rise to a problem: how can the relevant page image be placed next to the relevant portion of the transcription so that the transcriber can easily see what corresponds to what?

One obvious solution is to transcribe page by page. For each image show only the transcription of that page. However, this creates a problem for the technician and user alike: now the transcription of the document is divided into parts which must be stitched back together not only by the computer when the document is saved, but also mentally by the transcriber. Pages rarely end at paragraph end. More often they split in the middle of a sentence or even a word. The transcriber may change only one page and then the computer must reassemble the document with that one altered page in its middle. If the page is marked up in some way the page's transcription may not be complete or well-formed, which would hamper editing. All this is both technically messy and counter-intuitive for the user.

A better method is to display the entire document for editing: both the transcription as a continuous running text and the page images to which it corresponds as a scrolling list. The reason this is not often done is because of an intrinsic alignment problem: how to find the part of an image that corresponds to the currently displayed piece of text. To be readable page-images may need to be higher than the screen. Typically the transcription of a page is much shorter. But as a general rule the centre of the page's transcription should be aligned with the centre of the corresponding page-image across the centre of the screen. This is what the user expects. However, this creates a problem: the first and last pages cannot possibly conform to this rule. The first page must be aligned so that the top of the page image aligns with the top of the transcription text. And likewise at the bottom: on the last page the end of the transcription must correspond with the bottom of the last page-image.

Live sync-scrolling

To achieve live sync scrolling we need a table or function that gives the left-hand scroll position for each possible right hand scroll position.

What we musn't do is make the list of images itself scrollable. If we do that then we will have to link its own scrolling with that of the scrolling text. Since the right-hand-side (RHS) controls the scrolling of the left-hand-side (LHS) we will get infinite feedback if we link the scrolling in both directions. 'Scrolling' on the LHS can be achieved by other means, for example, by varying the CSS 'top' property of the overall list of images.

The scrolling positions for the LHS are just the mid-points of each image in the overall list. These correspond to the mid-points for each page of text in the RHS. The latter can be found easily by parsing the text. In my case, since I use a minimal markup language (MML) page-starts are marked by [NNN] where NNN is the page-number on a line by itself. Any scrolling position in-between two of these corresponding values can be interpolated by scaling. However, this does not work for the first and last pages because the desirable alignments in these cases are the top of the first image with the top of the first page of text, and likewise the end of the text with the bottom of the last image. So my solution was just to replace the mid-points of the first page in both the LHS and RHS with half the window-height. Likewise for the last mid-points I used the length of the text and the length of the image-list minus half the window height. In some cases 'half-way from the top or bottom of the window' may be in the middle of a page that is not the first or last. In that case the overlapping values can simply be removed, as long as the ones at the extreme ends of the list are preserved.

There is a demo of this method on the Charles Harpur Critical Archive, in Twin View for Harpur's poem Shelley.