Friday, 27 June 2014

TILT is making good progress

Check out the TILT blog. You can also follow us on Twitter @bltilt.

Wednesday, 7 May 2014

nmergec: a better way to collate and merge variant texts

The original idea of multi-version documents (MVDs) in 2009 was intended to address a common problem when editing cultural heritage texts: often the source documents are not cleanly reproduced texts that only exist in one version. In reality most historical documents exist in multiple drafts, have internal corrections, or are published as a set of editions or manuscript copies. An MVD is designed to gather all this information into a single digital bundle that can be commented on, or displayed in a variety of ways, because its structure mimics that of the documents it represents.

The basic principle of an MVD is that text occurring multiple times is only represented once, whereas the generally accepted method is to transcribe each physical document representing a work as a separate digital file, and to record internal variations within each document via markup. But this raises three serious difficulties:

  1. Markup was originally designed to represent formats as applied to linear text. Using it to describe non-linear text, as found in internal variations, leads to a conundrum: how can markup be used to describe changes in formats or textual structure through variation, when markup is already used to describe those very features? For example, two paragraphs joined up by an editorial stroke, or a cancelled format. Markup can't describe itself.
  2. There is a lot of redundancy between the versions represented as separate documents. If each of 20 versions has the same text at the same location, then why record it 20 times? And every time you copy or edit that piece of text you have to make sure it is the same throughout the 20 copies. This makes a lot of work for no reason. Computers are supposed to reduce work, not increase it.
  3. Comparison and annotation of the multiple files creates a logistical problem: each time two texts are compared requires the repetition of a costly computation, whose result is the same each time. Comparing two documents with internal versions is practically impossible, unless you throw away the internal variation, in which case why record it in the first place? And to annotate a piece of text shared between versions the same annotation has to be applied to each copy separately.

There has to be a better way to reorganise the underlying data of a work so that the information structure facilitates the kinds of operations the user wishes to perform upon it. This is the motivation behind the MVD.

Variant graphs

The basic idea of the MVD is the variant graph. Instead of representing text as a linear stream of characters it represents it as a branching stream that diverges or merges as required. Consider this short segment of Charles Harpur's Social Charity, MS C376 (and this is just one of five realisations of this poem):

In this one manuscript one can discern five layers:

Yet even that one subject is to starts
Of evil: in the clearest well thus lies

Yet even that one's prone to starts of wrong
As ever: in the clearest well their lies

Yet even he shall sometimes prove insure:
So: in the clearest fountain their lies

Yet even he shall sometimes prove insure:
So: in the clearest fountain there lies

Yet even he shall sometimes prove insure:
As in the clearest fountain ever lies

The variant graph of all the above looks like this:

This covers the text from "Yet even" to "in the clearest", which are shared by all five versions. Note that the resolution of this variant graph is the character. Although for some uses a word-granularity is better, for side-by-side display the character-level granularity is superior. It doesn't leave the user guessing as to exactly what is different in any two versions. Also, if desired, it is possible to increase granularity to word-level, but if you start from word-level it is harder to descend to character level.

It is important to realise that this is an internal data structure of a program written out to explain how it works, not a display format. Its purpose is to increase efficiency by removing redundancy, which it clearly does. Even better from the efficiency point of view the above complex graph with all its nodes and arcs can be reduced to a simple list of textual fragments belonging to versions:

[1-5]Yet even
[3-5]he s
[1-2]t
[1-5]ha
[3-5]ll sometime
[1-2]t one
[2]'
[2-5]s pro
[3-5]ve in
[2]ne
[1]
[1,3-5]su
[3-5]re:\n
[4]
[3,5]So
[1]bject is
[1-2] to starts
[2] of wrong
[1]
[1-2]\n
[2,4]As
[4]
[2]
[1]Of
[1-2] ev
[2]er
[1]il
[1-3,5]:
[1-5] in the clearest

THIS is what an MVD is, not the graph as such, although this is also a graph, just expressed differently. The list can be converted back into the graph by following three simple rules:

  1. Forced: if one fragment's versions intersect with the versions of the following fragment, then they define a node in the graph, to which the first and the second fragments attach as incoming and outgoing arcs respectively. For example, "to starts" (versions 1,2) intersects with "of wrong" (version 2). In the graph above this corresponds to a node between these two arcs.
  2. Overhang: If a fragment is not attached as outgoing to the preceding fragment by the forced rule, it attaches to the last fragment with which it intersects. For example, "bject is" (version 1) attaches as outgoing to the same node as the earlier "su" (versions 1,3-5), forming the words "subject is".
  3. Reverse overhang: All fragments not attached as incoming by rule 1 attach to the node of the next fragment in the list with which they intersect. So "ve in " (versions 3-5) attaches to "su" (versions a and 3-5) several fragments lower.

Any list of fragments so composed no matter how many versions it contains, no matter how complex, can be turned into a variant graph using these three rules. This is a lot simpler than the earlier nmerge program, which required five rules. The difference is that nmergec generates restricted variant graphs that have this property. So not any variant graph can be decomposed into such a list, only those generated by nmergec.

Some digital humanists have recognised that the variant graph is an intuitive way to conceive textual variation, but they only see it as a display format. But it is so much more than that. In fact the only way to represent complex cases of variation is to use the list of fragments, because in that way complexity does not increase as more versions are added. An MVD is thus best understood as an efficient data representation of the underlying content of a work.

Of course, getting the list of fragments in the first place is the problem. And that is what nmergec is designed to solve.

How nmergec works

Collation programs examine two versions at a time within a window of 50 words or a page. The first matching word they find between the two versions is taken as a position where the two texts are the same, and then the window is moved on by one word. Anything that did not match is a variant, insertion or deletion. The problem with this is that you can only see alignments within the window, and often they are outside it. And alignment word by word is incredibly error-prone. You can easily have transpositions, insertions or deletions that throw it out of alignment. That's why some early collation programs needed human input to help them.

CollateX, Juxta and Juxta Commons use comparison tools adapted from computer code comparison, which were designed to work at the granularity of lines, not words. They have been adapted to work with words, and as a result they are slow, and can only recognise small transpositions over short distances. The fact is, many transpositions occur on a much greater scale. Sometimes whole chapters in a novel are rearranged between drafts.For example, Wittgenstein rewrote his manuscripts by cutting them up into bits and rearranging them. We need something that is much more powerful and designed for the purpose.

MUM to the rescue!

A Maximal Unique Match or MUM is the longest unique common sequence of matching characters between two versions. It has to be unique because the longest match might occur several times, and aligning on one of those would probably be a mistake. Imagine a text with 20 spaces at the start of each line. Once the longest sections of actual text had been aligned those sequences of spaces would be the longest matches left. So it has to be unique. Nmergec uses the powerful Ukkonnen suffixtree algorithm to find MUMs, and it uses that to find the best point to align two texts. In the example above the MUM between versions 1 and 2 is marked in bold:

Yet even that one subject is to starts
Of evil: in the clearest well thus lies

Yet even that one's prone to starts of wrong
As ever: in the clearest well their lies

If we imagine that the first version is the MVD and the second one is the one we are adding to it then the leftovers after accepting this alignment will be "Yet even that one's prone to starts of wrong As ever" and "eir lies". Now all we have to do is to repeat the process on each of these fragments recursively until all that can be aligned is aligned. This is called a greedy algorithm: it gobbles up the biggest bits first, then progressively smaller sections. It's perhaps not perfect but it is undeniably fast, or as fast as these things get. Extending it to work on a full MVD with already many versions is not hard, we just choose a match the same way, but between the whole graph and the new version. I won't draw it because it will be too messy, but it is the same principle.

Transpositions

After the first alignment subsequent alignments don't always match with the "opposite" text. An example is:

I had written him a letter which I had, for want of better
Knowledge, sent to where I met him, years ago, down the Lachlan.

I had written him a letter which I had, for want of better
Knowledge, sent to where I met him down the Lachlan, years ago,

I've marked the transposition in bold. After aligning the longer text that surrounds it, nmergec eventually computes the alignment ", years ago". But it only fits on the other side of an already aligned section:

The big change in nmergec is that it treats transpositions in just the same way as direct alignments. The only difference is that transpositions are assessed based on length and distance between the two halves. If they are too far apart for a given length they are rejected. For example, the "transposition" of the word "is" between the beginning and end of a chapter is not a transposition at all, but the transpostion of "is" between "is there" and "there is" is a transposition. Transpositions are not copied, but the half in the new version "points to" the first half in the already existing MVD, like a child to its parent.

Rafting

As we sail down the rapids, bits of our former boat can be lashed together to form a raft that may save our lives. Similarly, matches between versions are often not exact. There may be a different type of quotation mark, or a small change, but the match basically goes on after that. The problem is, with suffixtrees or any kind of literal match, these small variations break up the matching process into chunks that, for example, mess up the assessment of whether something is a transposition or not. Also the greedy algorithm described above relies on choosing the biggest chunk first to get the best alignment between two texts, and this influences which bits are transposed and which bits get aligned "directly". So rafting is chaining together matches that have a short section of mismatch, but continue afterwards. Our Clancy of the Overflow example continues:

He was shearing sheep when I knew him, so I sent the letter to him, Just `on spec', addressed as follows, `Clancy, of The Overflow'.

He was shearing when I knew him, so I sent the letter to him, Just 'on spec', addressed as follows, 'Clancy, of The Overflow'.
"In this case the MUM is composed of three sections: 1) "when I knew him, so I sent the letter to him, Just ", 2) "on spec', addressed as follows, " and 3) "Clancy, of The Overflow'.". In-between there is variation in using back-quotes for opening quotes rather than the single quotes of version 2. So these three MUMs get assessed as one block, but are aligned as three separate blocks. So "matches" in the variant graph or MVD are still literal matches, with little variants in between. Up to two characters may mismatch in the MVD or in the new version and the match will be allowed to continue.

Working with any language

Collation programs are usually designed to work on texts in the language of their author. Most of them work OK with European languages that can be expressed in 8-bit characters, but fail on anything more complex. Nowadays everyone uses Unicode, which may convert many characters like accented letters, punctuation etc into multi-byte sequences in UTF-8, the most widespread encoding in modern computing. This plays havoc with programs designed only to work with 8-bit characters, so I decided from the start to make nmergec work with 16-bit Unicode. That way it could handle Chinese and Indian languages, ancient Greek, Russian, etc with ease. To do this I had to use ICU, a library maintained by IBM and now the embodiment of the Unicode standard in code form. ICU can convert just about any encoding into another. An MVD can be in any encoding you like, but internally nmergec uses 16-bit Unicode for all comparisons. Unless you are editing a text in an obscure dead language like ancient Phoenician you don't need more than 16 bits, and even then nmergec will continue to work because it uses UTF-16, so 32-bit characters will just get split into multi-character sequences. I think multi-language handling is very important and as far as I am aware no other merging/collation program supports it.

Immortal code

When Thucydides wrote his History of the Peloponesian War in the fourth century BC he said he was creating a "possession for ever", and it turned out to be true. But when writing computer programs that is a hard act to follow. Technologies come and go, and any program written today is likely to be useless in two years' time. I didn't want to keep coming back to my program just because some dimwit in some mega company thought it cool to change their programming language or tools just to roll out some new feature. So I wrote nmergec in C, one of the oldest and still the most popular programming language on the planet. Why? Who uses it any more? Surely C will soon die because it is so old. C is popular for several reasons, which happen also to be my reasons for choosing it:

C is incredibly stable. I have programs on my computer written in 1980 in C that still compile flawlessly. C is the language of Linux, OSX, iOS, BSD and Android, and all other flavours of UNIX, and variations of it are used to write mobile phone software and virtually all device drivers. C is going to be around for a very long time.

Also C is as fast as anything out there, and I needed speed. C is compiled directly into machine code, whereas almost all the new fancy languages like Node, Python, Haskell, Ruby, etc. are basically scripting languages that get compiled every time they run. That makes them about 100 to 1000 times slower than C. True, Java may be a bit faster in some test cases, but I have found that C is dramatically faster than Java when you have a complex program.

C allocates and frees memory as it goes. Java uses a garbage collector, which slows it down at crucial moments, and also makes it a memory-hog. You have to declare in advance how much memory you want to use in Java for your program to run, but in my case this is inconvenient because it depends on the size of the files. Since Java wastes so much memory this was the main reason it was limited to 70K files in the old nmerge.

Almost any language already in use can link to a C program. In Java you have JNI, and parts of Java itself already use C for speed. In PHP you can define an extension which is written in C. The same goes for most other languages. C is versatile, and will compile in any programming environment.

You may say that few people will be able to understand what I have written because it is complex and in a language not much used in the digital humanities. True, but those who are serious to extend or repair the tool will either make the effort or hire someone who has the skill. I have documented it well, it is carefully organised into manageable chunks, and it has a comprehensive test suite. I can do no more.

Size matters

Cultural heritage texts can be big things. Often alterations between versions can span entire works, such as Joseph Furphy's Such is life:

Notice how the author's original manuscript was revised by cutting out two entire chapters, which were replaced by two new ones in the printed book, but then the excised chapters were revised and published as two separate books. Variation occurs on this scale in real-world examples, not just at the microscopic level of individual sentences. Nmergec was designed to compare texts of several megabytes. I haven't tested the whole program yet on files that large, at this time of writing, but it is designed to do it. No other comparison program can handle variation over spans this large.

NMergeC is modular

One of the problems with the old nmerge was that it kept growing as more and more features had to be added. I needed a way to generate stemmas or trees showing the ancestors and children of versions. I also needed to produce an apparatus aligned by words, to compute the degree of difference between each version. So in nmergec I decided to break each of the program's functions into modules. The nmergec program itself just loads and saves MVDs, nothing more. mvd_add adds one version to an existing MVD. There are other modules, as yet incomplete, which do a great variety of tasks, but operate on MVDs. If you want nmergec to do something new just write a plugin. There is a simple application-programming interface to do this, though I haven't documented it yet. The following plugins are already defined or will soon be adapted from parts of the old nmerge:

  • create an new empty MVD
  • generate a table suitable for an apparatus
  • compute the variants of a section of one version
  • add a new version
  • delete one version and its contents from an MVD
  • list the versions of an MVD
  • compute a phylogenetic tree using neighbour-join
  • save an MVD as an archive of separate versions
  • read one version
  • read in an archive and convert to an MVD
  • compare two versions
  • search for a string
  • rename an MVD
  • replace an existing version with a new one

Keeping it simple (or the KISS principle)

One way to visualise an MVD is as a deck of cards. Each textual fragment corresponds to the value of a card, and the suit corresponds to the set of versions to which the fragment belongs. We also sort the deck into a particular order, like a card-shark cheating in some way. NMergec differs from nmerge in that it operates directly on the cards or fragments, and never builds an explicit graph. Most of the complexity of nmerge was caused by this simple difference. NMergec can be understood as a card-shuffling program, that splits cards into segments and shuffles them into a very particular order. The variant graph still exists, but it is implicit, and doing it this way is a lot simpler. For example, let's say you have 2,000 versions of a work. That is quite possible with New Testament texts. If you create a variant graph of even a short section of such a work it is incomprehensible. It is also, unsurprisingly, just as complex for the programmer to make sense of. But a list of fragments remains simple no matter how many versions are added to it, and never becomes a "huge hairy mess".

NB: "nmergec" is pronounced "en-merge-see", not "en-merge-ec"

Sunday, 23 February 2014

Rewriting TILT

Linking areas on an image to segments of text so you can highlight one or the other and show what fragment of an image produced what transcription sounds like a crazy pedantic idea. At least that is what I thought when I first heard about it. But the fact is, if you want to display a facsimile image next to a transcription the user has no easy way to make the correlation between what corresponds to what. They spend their whole time scrolling up and down, scanning the image with their eyes and then going back to the text, losing where they were on the image and starting again, etc. Following the HCI notion of least 'excise' or user effort to get a task done, text-to-image links make it easy to read a manuscript facsimile. Not so crazy after all!

TILT (text-Image-Linking Tool) was a pun on TILE, and like TILE was intended to allow for semi-automatic selection of areas on an image and linking them to segments of text. The problem with TILE was that it relied too much on Javascript. Javascript may be the up and coming child of the Web but it is still relatively immature and slow. TILT decided instead to use Java in the form of an applet, which would have access to all the amazing image manipulation tools of the Java class library for free, and be able to do things like highlight areas as the mouse moved over it, to resize regions, and to recognise lines when the image was tilted etc (hence TILT, get it?). The only problem with this design is that the Java gets downloaded to the browser, and the compatibility of browsers with applets is not good. Also that puts a strain on the Internet link, and then drawing refreshing and resizing is a problem.

TILT2

TILT2 comes to the rescue. I always find that second goes at a design often work better because you have behind you the experience of the initial failures. TILT2 gets around the problems of TILE and TILT1 by using HTML5 to do all the drawing and Javascript to handle the clicking and dragging events. The Java is still used to do the image transforms and word-detection but it is all done on the server, and only the results are sent down the wire as JSON, or they get handled directly in the browser. Here's how the dataflow and various components will look in TILT2. (This is just a design for now):

The images are stored in Mongo's grid-fs. In the same database are stored the plain text and markup overlays called cortex and corcode respectively. CorCode has the advantage over standard standoff markup or directly embedded markup in that you can overlay any number of markup sets onto the same text and it produces valid HTML. The HTML then gets sent to the browser, which has a simple window with two panels. The left one shows the image inside a HTML5 canvas. The right hand side has a transcription of the image's contents. As the user moves over regions occupied by words in the image those regions turn pink, and a corresponding region on in the text on the right is highlighted also. It also works vice-versa.

Event-handling

This works by drawing the highlighted regions using javascript. Events are captured by the mouse-movements and sent to the HTML canvas object, which responds to Javascript commands. The Image itself is a bare-bones custom image representation that was originally downloaded from the server, and created by merging the CorCode for the image with the image itself (see top of drawing). The right hand side is likewise composed of structural markup and spans that are activated when the mouse moves over them too. The markup to achieve this is likewise supplied from the CorCode, which points to regions inside the plain text (CorTex).

When the user drags the corner of a region on the left, javascript is used to track its movements, and highlighting is instant. The user can also select a region on the left for recognition by single-clicking with the region selection tool. On the right hand side a corresponding piece of text can be selected by just dragging the mouse over the text. (ierange is used to get the text selection). When the user clicks the "recognise" button in the toolbar the contents of the page are sent up to the server for analysis, and word-regions on the left are matched up with words on the right. The results are then computed into CorCode/CorTex form and sent back to the browser for rendering.

TILT Automation

The really cool bit in TILT1 is that the user can quickly refine the guessimate made by the server by selecting a region already recognised on each side. Re-recognising does the same thing but starts at two known good end-points on either side. In the most fine-grained case one word on each side could be chosen, but in most cases great swathes of text can be selected in one go. TILT1 uses a clever alignment algorithm adapted from textual diff tools to align the word-shapes on the left with words of corresponding length on the right by taking account of their order. When the user is satisfied with the alignment he/she can press "next" or"prev" to go on to a new page or to refine a previously done page, and the work is automatically saved.

The problem with this is that it is still just a design. But I need it for two projects: the De Roberto I Viceré and the Charles Harpur critical archive, both of which have extensive manuscript facsimiles to compliment the texts. Without automation such text image alignment would be infeasible on this scale. The thing I like about this design is that each software component does what it is good at, and delegates the rest to the other components.

Yes, it will require HTML5, but all modern browsers support this. Without HTML5 it becomes very very messy to do the drawing on IE (using VML) and another way on other browsers. If it doesn't work for you and you need it, just update your browser or if you can't then buy a new computer or tablet. I haven't got time to support every damn browser out there.

Tuesday, 14 January 2014

Decoupling TEI from XML

I thought it was about time that someone surveyed the 553 tags now in the TEI XML P5 tagset version 2.6.0. An interesting question is where is TEI today, where is it going and how big will it get? Technology changes all the time, and XML is no exception, but TEI is so tightly coupled to XML it seems inconceivable that it could have an existence outside of XML. But it is possible, and moreover, necessary if it is ever to be brought up to date with the digital textual models of 2014, rather than those of 1987. To determine how to do that it is essential to first discover exactly what TEI is. My idea is to work out which tags describe essential attributes of cultural heritage texts that cannot be found in other widely used metadata standards. Also, by classifying the tags into genres and functional groups (in a user-centric way) they can be reassessed en masse.

So far the survey has covered all the TEI P5 2.6.0 tags up to the end of o, which is 62% of the whole. Of these only 60 tags pass the criteria of usefulness specified here, and 288 do not. This sample suggests that all the tags will fall into a number of simple categories:

  1. Metadata ...
    This is data about the document as a whole. So far, with one or two exceptions, I have found that all the TEIHeader information can be adequately described using already existing metadata tags from widely used schemes such as Dublin Core, MODS, foaf, EAD. In modern repositories metadata about documents is stored separately from the documents it describes, so it can be interoperable across disciplines. Once it was removed there would be no point in retaining the TEI format. The librarians and archivists of the world would quickly translate it into more standardised forms. So I think that the TEIHeader metadata tags can all be removed. The metadata tags account for a quarter of the entire TEI schema.
  2. Specialised tags for non-literary genres ...
    Geographical data, linguistics codes and screenplays are already covered much better outside of TEI. It seems pointless to try to compete with these more detailed schemas, so all these tags can go.
  3. Tags for defining stuff that should be external to the document ...
    For example, like taxonomies and character definitions.
  4. Programmatic data ...
    For example, graphs and tags for documenting the TEI Guidelines themselves have no place in a humanities markup scheme.
  5. Variant tags ...
    (app, rdg, add, del, sic, corr, abbr, expan, orig, reg etc.) can be re-expressed as coherent layers. That way each version or correction or editorial layer can be marked up separately and if needed compared with one another.
  6. Annotation tags ...
    Such as <gloss> and <note> contain comments on the text and should be defined via an external annotation system.
  7. Topographical layout tags ...
    E.g. <line> and <surface> are designed to present a drawing of a manuscript and to connect it to a transcription. And yet many vector graphics drawing formats already exist, such as SVG. This markup strategy leads to extreme complexity and to the need to encode the text twice: once topographically and one textually. This is quite unnecessary since linking areas on a facsimile can be accomplished more simply by using standoff properties.

Graphs

How many tags are actually useful for encoding cultural heritage texts, and how many are not?

This graph shows the break down so far into broad categories.

Increasing efforts to add new tags seem to have less and less success.

TEI howlers

  1. Economising on element names ...
    • <editor role="illustrator">John Tenniel</editor> In what sense is an illustrator any kind of editor? The definition of editor says that this can be "editor, compiler, translator, etc."
  2. Duplicating elements ...
    • In P4 there were two <desc> elements. The first was defined in P2 as "a description of a character or character form" but in P4 another <desc> element was added: "description of the purpose and application for an element, attribute, or attribute value" In P5 the first kind of <desc> was quietly dropped.
    • In P2 <biblScope> described "a range of pages or a volume number". In P5 2.3.0 <citedRange> was added with exactly the same definition but a slightly different context, although both elements could appear within a <bibl> element. They should have just defined a single element <range> which would mean different things in different containers.
    • <handDesc> contains "a description of all the different kinds of writing used in a manuscript". <handNotes> on the other hand, "contains one or more handNote elements documenting the different hands identified within the source texts". The only difference appears to be that <handDesc> can have a plain text description of all the hands as well as a breakdown of individual ones. So why oh why do we need <handNotes>? The duplicity is particularly striking as both <handNotes> and <handDesc> were introduced at the same time, and nobody noticed.
    • <listChange> introduced in P5 2.0.0 contains a list of changes in the header, but so does the older <revisionDesc>. All that was needed was a slight redefinition of <revisionDesc>, or maybe nobody noticed that it was already there.
    • P5 introduced a number of <list-> elements, one for each type of content. listNym, listOrg, listPerson, listPlace. In that case why not also listShopping or listToDo? Surely all that is needed is one list element with different contents. These elements are designed to make writing XSLT scripts easier, not to answer a real user need.
    • P5 2.0.0 introduced <mod>, for grouping <add> and <del>, but there was already <subst>, not to mention <app> (within <rdg>) for the same purpose.
    • P5 introduced <msItem>: "an individual work or item within the intellectual content of a manuscript or manuscript part", but also msItemStruct, which is the same thing in more structured form, and also msPart, which describes "an originally distinct manuscript or part of a manuscript, now forming part of a composite manuscript". Aren't all these things basically the same?
  3. Changing Names in the "standard" ...
    • <witList> (P4) was changed to <listWit> in P5 1.0.0 to make it uniform with with the new list elements.
    • <itype> from P3-P4 was changed to <iType> in P5.
  4. Superfluous elements ...
    • <monogr> was discussed in P1, added in P2. It is a "monograph" – a type of bibliographic item. But where are the corresponding bibliographic types for article, thesis, inchapter, inproceedings and electronic etc.?

Survey data

Here's my survey so far. Just click on the links to expand the items. This is my basis for the data summarised in the graphs, broken down on an element by element basis.

  • ab to availability
    • ab ...
      Description: anonymous block
      Since: P4
      Type: General
      Reason: No semantic content
    • abbr ...
      Description: abbreviation
      Since: P1
      Type: Variant
      Reason: Usually coupled with expansion and so defines alternatives; better represented externally
    • abstract ...
      Description: contains a summary or formal abstract prefixed to an existing source document by the encoder
      Since: P5 2.6.0
      Type: Metadata
      May contain: 11
      Contained by: 1
      Reason: use dcterms:abstract; mods:abstract etc
    • accMat ...
      Description: describes material accompanying a manuscript
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:relatedMaterial
    • acquisition ...
      Description: Description of how an MS was acquired
      Since: P5 1.0
      Type: Metadata
      Reason: dcterms:provenance or ead:acqinfo
    • activity ...
      Description: informal description of what a participant in a language interaction is doing other than speaking
      Since: P2
      Type: Linguistic
      Reason: linguistic code
    • actor ...
      Description: name of actor
      Since: P1
      Type: Literary
    • add ...
      Description: encodes an addition to a manuscript
      Since: P2
      Type: Variant
      Reason: creates alternatives; better represented externally
    • additional ...
      Description: groups additional information, combining bibliographic information about a manuscript, or surrogate copies of it with curatorial or administrative information
      Since: P5 2.6.0
      Type: Metadata
      May contain: 3
      Contained by: 2
      Reason: use mods:note or ead:note
    • additions ...
      Description: contains a description of any significant additions found within a manuscript, such as marginalia or other annotations
      Since: P5 2.6.0
      Type: Metadata
      May contain: 208
      Contained by: 1
      Reason: use ead:change
    • addName ...
      Description: Nickname or epithet
      Since: P3
      Type: Linguistic
      Reason: Use foaf:nick
    • addSpan ...
      Description: longer version of add
      Since: P2
      Type: Programmatic
      Reason: creates alternatives; replace with layers; same as add
    • additional ...
      Description: groups additional information about manuscripts
      Since: P5
      Type: Metadata
      Reason: just gives structure; not needed in ead
    • additions ...
      Description: describes nature of additions to an MS
      Since: P5
      Type: Metadata
      Reason: move to metadata about layers, e.g. dc:description
    • addrLine ...
      Description: line of an address
      Since: P3
      Type: Literary
      Reason: Unnecessary: better expressed by generic line within address
    • address ...
      Description: postal address
      Since: P2
      Type: Literary; Metadata
      Reason: in metadata use ead:addressLine; in text use <l>
    • adminInfo ...
      Description: describes the present custody and availability of the manuscript
      Since: P5
      Type: Metadata
      Reason: use dcterms:mediator, dcterms:provenance, dcterms:available etc.
    • affiliation ...
      Description: an informal description of a person's present affiliated institution
      Since: P1
      Type: Metadata; General
      Reason: use foaf:organization
    • age ...
      Description: a person's age
      Since: P2
      Type: Metadata; Linguistic
      Reason: use foaf:age; linguistic code
    • alternate ...
      Description: an alternation of references
      Since: P5 2.6.0
      Type: Programmatic
      May contain: 5
      Contained by: 3
      Reason: only useful for XML or TEI documentation
    • alt ...
      Description: alternation
      Since: P2
      Type: Programmatic
      Reason: purely programmatic; no semantic content; creates alternatives
    • altGrp ...
      Description: group of alts
      Since: P3
      Type: Programmatic
      Reason: purely programmatic
    • altIdent ...
      Description: supplies an XML name in some language
      Since: P5 1.0.0
      Type: Programmatic
      Reason: Purely programmatic construct; only serves needs of XML
    • altIdentifier ...
      Description: earlier identifier for an item
      Since: P5 1.0.0
      Type: Metadata
      May contain: 8
      Contained by: 2
      Reason: use dcterms:alternative; dc:identifier; ead:otherfindaid
    • am ...
      Description: specifies curtailed characters or position in abbreviation that are later expanded
      Since: P5 1.0.0
      Type: Variant
      Reason: better computed by software; very fine detail.
    • analytic ...
      Description: contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication
      Since: P2
      Type: Metadata
      Reason: Better described via existing metadata standards.
    • anchor ...
      Description: attaches an identifier to a point within a text
      Since: P1
      Type: Programmatic
      Reason: serves only XML function; <a> tag already in HTML; gives no meaning to text
    • app ...
      Description: an apparatus criticus entry
      Since: P1
      Type: Variant
      Meaning: None
      Reason: Better represented externally; creates alternatives
    • appInfo ...
      Description: information about an application
      Since: P5 1.0.0
      Type: Metadata
      Reason: doesn't apply meaning to text; only services needs of applications.
    • application ...
      Description: information about an application
      Since: P5
      Type: Metadata
      Reason: contained by appInfo element; only services needs of XML.
    • arc ...
      Description: arc in a graph
      Since: P2
      Type: Programmatic
      Reason: Creates potentially noncomputable structures; better described by existing languages e.g. SVG; creates alternatives; doesn't assign meaning to text
    • argument ...
      Description: argument at start of a chapter
      Since: P2
      Type: Literary
    • att ...
      Description: XML attribute name
      Since: P2
      Type: Programmatic
      Reason: off topic; only describes XML documents
    • attDef ...
      Description: XML attribute definition
      Since: P2
      Type: Programmatic
      Reason: off topic; only describes XML documents
    • attList ...
      Description: attribute list for XML/SGML
      Since: P2
      Type: Programmatic
      Reason: off topic; only describes XML documents
    • attRef
      Description: points to attribute definition
      Since: P2
      Type: Programmatic
      Reason: off topic; only describes XML documents
    • author ...
      Description: author of a work
      Since: P1
      Type: Metadata
      Reason: use dc:author
    • authority ...
      Description: person or other agency responsible for making a work available
      Since: P1
      Type: Metadata
      Reason: use dcterms:publisher
    • availability ...
      Description: supplies information about the availability of a text
      Since: P2
      Type: Metadata
      Reason: use dcterms:accessRights
  • back to byline
    • back ...
      Description: classifies part of a printed book as the "back matter"
      Since: P1
      Type: Programmatic
      Reason: Best represented by a separate file; doesn't really apply meaning to text; part of the old all-in-one file approach (IBM's GML).
    • bibl ...
      Description: Describes a bibliographic reference
      Since: P1
      Type: General
    • biblFull ...
      Description: describes a full bibliographic reference in the TEI Header
      Since: P2
      Type: Metadata
      Reason: better represented by external metadata standards, e.g. mods.
    • biblScope ...
      Description: describes a range of pages or a volume number
      Since: P2
      Type: General
      Reason: rename as scope or range; same as citedRange
    • biblStruct ...
      Description: describes a bibliographic reference
      Since: P2
      Type: General
      Reason: same as bibl except that it refers to a different XML content model (not mixed).
    • bicond ...
      Description: defines a biconditional feature-structure constraint; both consequent and antecedent are specified as feature structures or groups of feature structures; the constraint is satisfied if both subsume a given feature structure, or if both do not.
      Since: P3
      Type: Linguistic
      Reason: linguistic code; application-specific; similar to alt; off topic
    • binary ...
      Description: binary true or false value
      Since: P3
      Type: Linguistic; Programmatic
      Reason: linguistic code; programmatic; off topic; doesn't describe text.
    • binaryObject ...
      Description: base64 or other encoding of a binary object such as an image
      Since: P5 1.0.0
      Type: Programmatic
      Reason: doesn't describe text; programmatic; better supported in the host technology (e.g. <img> in HTML).
    • binding ...
      Description: binding of a manuscript
      Since: P5 1.0.0
      Type: Metadata
      Meaning:
      Reason: use ead:physfacet type="binding"
    • bindingDesc ...
      Description: group binding descriptions
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:physdesc
    • birth ...
      Description: description of birth
      Since: P2
      Type: General
    • bloc ...
      Description: group of countries such as Africa, EU
      Since: P5 1.0.0
      Type: Geographic
      Reason: use mods:region or mods:continent or GML
    • body ...
      Description: main body of text
      Since: P1
      Type: Programmatic
      Reason: Like front and back matter this model is borrowed from IBM's GML; does not describe a textual property; depends on the technology for rendering, e.g. HTML already has a body element.
    • broadcast ...
      Description: Describes a broadcast used as the source of a spoken text
      Since: P2
      Type: Linguistic; Metadata
      Reason: Linguistic code; metadata already served by existing standards such as PBCore.
    • byline ...
      Description: statement of responsibility given for a work on its title page
      Since: P2
      Type: Literary
  • c to custodialHist
    • c ...
      Description: character
      Since: P1
      Type: Linguistic
      Reason: redundant, since content is already a character; attributes, if any, should be promoted to character properties instead
    • caesura ...
      Description: point where a verse may be divided
      Since: P1
      Type: Literary
    • calendar ...
      Description: describes a calendar in header
      Since: P5
      Type: Metadata
      Reason: no standard metadata term for this element
    • calendarDesc ...
      Description: calendar description container
      Since: P5 2.0.0
      Type: Metadata
      Reason: Just a group container for calendar.
    • camera ...
      Description: camera description in screenplays
      Since: P2
      Type: Screenplay
      Reason: off topic; better representation of screenplays via Fountain language.
    • caption ...
      Description: caption stage direction in screenplays
      Since: P2
      Type: Screenplay
      Reason: off topic; better representation of screenplays via Fountain language.
    • case ...
      Description: grammatical case of a word
      Since: P3
      Type: Linguistic
      Reason: Linguistic code; programmatic; empty element doesn't qualify text.
    • castGroup ...
      Description: groups cast members
      Since: P2
      Type: Literary
    • castItem ...
      Description: entry in a castList
      Since: P2
      Type: Literary
    • castList ...
      Description: contains a single cast list or dramatis personae
      Since: P2
      Type: Literary
    • catchwords ...
      Description: describes the system used to ensure correct ordering of the quires making up a codex or incunable, typically by means of annotations at the foot of the page
      Since: P5
      Type: Metadata
      Reason: can be recorded via EAD <physfacet type="catchwords">.
    • category ...
      Description: describes an individual descriptive category in a user-defined taxonomy
      Since: P1
      Type: Metadata
      Reason: taxonomy should be external; off-topic; for user-defined taxonomies only
    • catDesc ...
      Description: describes an individual descriptive category
      Since: P2
      Type: Metadata
      Reason: taxonomy should be external; off-topic
    • catRef ...
      Description: refers to (apparently) user-defined taxonomies
      Since: P2
      Type: Metadata
      Reason: taxonomy should be external; off-topic
    • cb ...
      Description: marker for column break
      Since: P3
      Type: General
    • cell ...
      Description: general table cell
      Since: P2
      Type: General
      Reason: general formatting tag available in HTML as <td>
    • certainty ...
      Description: certainty of some other property
      Since: P1
      Type: Literary
    • change ...
      Description: describe change in revised documents
      Since: P2
      Type: General
      Reason: off topic; should be a comment in a revision control system
    • channel ...
      Description: describes the medium or channel by which a text is delivered
      Since: P2
      Type: Linguistic; Metadata
      Reason: linguistics code and concept; use dcterms:medium
    • char ...
      Description: defines a non-standard character
      Since: P5 1.0
      Type: Programmatic
      Reason: off topic; should be external
    • charDecl ...
      Description: groups char definitions
      Since: P5 1.0
      Type: Programmatic
      Reason: off topic; should be external
    • charName ...
      Description: name of a user-defined character
      Since: P5 1.0
      Type: Programmatic
      Reason: off topic; should be external
    • charProp ...
      Description: property of a user-defined character
      Since: P5 1.0
      Type: Programmatic
      Reason: off topic; should be external
    • choice ...
      Description: choice of encodings
      Since: P5 1.0
      Type: Variant
      Reason: only groups; represent via external layers
    • cit ...
      Description: groups quotation and reference
      Since: P2
      Type: General
    • citedRange ...
      Description: defines the range of cited content, often represented by pages or other units
      Since: P5 2.3.0
      Type: General
      Reason: copy of biblScope; should be renamed scope or range
    • cl ...
      Description: represents a grammatical clause
      Since: P3
      Type: Linguistic
      Reason: linguistic code
    • classCode ...
      Description: a subject classification code
      Since: P2
      Type: Metadata
      Reason: use dcterms:subject or mods:classification
    • classDecl ...
      Description: declares a number of taxonomies
      Since: P2
      Type: Metadata
      Reason: obsolete; use namespace declarations instead
    • classes ...
      Description: for documenting TEI elements
      Since: P2
      Type: Programmatic
      Reason: TEI-specific; off topic
    • classRef ...
      Description: for documenting TEI elements
      Since: P5 1.7.0
      Type: Programmatic
      Reason: TEI-specific; off topic
    • classSpec ...
      Description: specification of a TEI element
      Since: P5 1.0.0
      Type: Programmatic
      Reason: TEI-specific; off topic
    • climate ...
      Description: description of a climate
      Since: P5 1.0.0
      Type: Geographic
      Reason: off topic; use GML (geographical markup language)
    • closer ...
      Description: groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter.
      Since: P2
      Type: Literary
    • code ...
      Description: contains literal code from some formal language such as a programming language
      Since: P5 1.0.0
      Type: Programmatic
      Reason: off topic; use code tag in HTML
    • collation ...
      Description: contains a description of how the leaves or bifolia are physically arranged
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:physfacet type="collation"
    • collection ...
      Description: contains the name of a collection of manuscripts
      Since: P5 1.0.0
      Type: Metadata
      Reason: use dcmitype:collection
    • colloc ...
      Description: contains a collocate of the headword
      Since: P3
      Type: Linguistic
      Reason: linguistic code
    • colophon ...
      Description: contains the colophon of a manuscript item
      Since: P5 1.0.0
      Type: Literary
    • cond ...
      Description: conditional feature-structure constraint
      Since: P3
      Type: Linguistic
      Reason: linguistic code
    • condition ...
      Description: description of the physical condition of the manuscript
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:physdesc or mods:physicalDescription
    • constitution ...
      Description: the internal composition of a text, for example as fragmentary, complete, etc.
      Since: P2
      Type: Linguistic; Metadata
      Reason: linguistic code
    • constraint ...
      Description: the formal rules of a constraint
      Since: P5 1.4.1
      Type: Programmatic
      Reason: off topic
    • constraintSpec ...
      Description: contains a constraint, expressed in some formal syntax, which cannot be expressed in the structural content model
      Since: P5 1.4.1
      Type: Programmatic
      Reason: off topic
    • content ...
      Description: contains the text of a declaration for the schema documented
      Since: P5 1.0.0
      Type: Programmatic
      Reason: off topic; TEI self-documentation
    • corr ...
      Description: the corrected form of an erroneous piece of text
      Since: P3
      Type: Variant
      Reason: creates branches in the text; use layers instead
    • correction ...
      Description: states how and under what circumstances corrections have been made
      Since: P1
      Type: Metadata
      Reason: no exact match, use ead:odd or ead:edition
    • country ...
      Description: name of a country
      Since: P2
      Type: General
    • creation ...
      Description: information about the creation of a text
      Since: P2
      Type: Metadata
      Reason: use ead:creation
    • cRefPattern ...
      Description: specifies an expression and replacement pattern for transforming a canonical reference into a URI
      Since: P5 1.0.0
      Type: Programmatic
      Reason: off topic; supplies programming data only
    • custEvent ...
      Description: describes a single event during the custodial history of a manuscript
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:acqinfo
    • custodialHist ...
      Description: contains a description of a manuscript's custodial history, either as running prose or as a series of dated custodial events
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:custodHist
  • damage to domain
    • damage ...
      Description: contains an area of damage to the text witness
      Since: P3
      Type: Literary
    • damageSpan ...
      Description: marks a longer span of damaged text
      Since: P5 1.0.0
      Type: Literary
      Reason: same as damage; only added to overcome XML's limitation on overlap
    • datatype ...
      Description: specifies the declared value for an attribute.
      Since: P3
      Type: Programmatic
      Reason: programmatic; only for documenting TEI itself
    • date ...
      Description: contains a date in any format
      Since: P1
      Type: Literary
    • dateline ...
      Description: description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer
      Since: P2
      Type: Literary
    • death ...
      Description: contains information about a person's death, such as its date and place
      Since: P5 2.6.0
      May contain: 157
      Contained by: 2
      Type: General
    • decoDesc ...
      Description: describes the decoration of a manuscript.
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:physfacet; use an actual facsimile; what's computable here?
    • decoNote ...
      Description: contains a note describing either a decorative component of a manuscript, or a fairly homogenous class of such components
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:physfacet; use an actual facsimile; what's computable here?
    • def ...
      Description: contains definition text in a dictionary entry
      Since: P3
      Type: Linguistic
      Reason: off topic; linguistic code
    • default ...
      Description: represents the value part of a feature-value specification which contains a defaulted value
      Since: P2
      Type: Linguistic
      Reason: off topic; linguistic code
    • defaultVal ...
      Description: represents the value part of a feature-value specification which contains a defaulted value
      Since: P5 1.0.0
      Type: Programmatic
      Reason: only for formatting TEI documentation
    • del ...
      Description: contains a deletion
      Since: P2
      Type: Variant
      Reason: use layers instead; creates alternatives
    • delSpan ...
      Description: contains a longer deletion that "spans" other elements
      Since: P2
      Type: Variant
      Reason: creates non-computable structures; same as del; creates alternatives; use layers instead
    • depth ...
      Description: contains a measurement measured across the spine of a book or codex
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:physfacet
    • derivation ...
      Description: describes the nature and extent of originality of this text
      Since: P2
      Type: Metadata; Linguistic
      Reason: use dc:provenance; linguistic code
    • desc ...
      Description: originally "a description of a character or character form" now "description of the purpose and application for an element, attribute, or attribute value" In P4 both versions existed with the same name! purpose and application for an element, attribute, or attribute value
      Since: P2; P4
      Type: Programmatic
      Reason: Only TEI documentation; off topic
    • dictScrap ...
      Description:encloses a part of a dictionary entry in which other phrase-level dictionary elements are freely combined
      Since: P4
      Type: Linguistic
      Reason: off topic; linguistic code; poorly defined
    • dim ...
      Description:contains any single measurement forming part of a dimensional specification of some sort
      Since: P5 2.6.0
      Type: Metadata
      May contain: 1
      Contained by: 224
      Reason: use ead:physfacet; abstract element subsumes former specific elements
    • dimensions ...
      Description:contains physical dimensions
      Since: P5 1.0.0
      Type: Metadata
      Reason: use ead:physfacet
    • distinct ...
      Description:marks a distinct form of expression, e.g. slang
      Since: P2
      Type: Linguistic
      Reason: linguistic code; like <hi> this is a catch-all tag; the real meaning is in the attribute, which should be the name of a set of such elements or properties
    • distributor ...
      Description:supplies the name of a person or other agency responsible for the distribution of a text
      Since: P1
      Type: Metadata
      Reason: use dc:publisher
    • district ...
      Description:contains the name of any kind of subdivision of a settlement, such as a parish, ward, or other administrative or geographic unit
      Since: P1
      Type: Geographical
      Reason: off topic; use GML (geography markup language) or similar
    • div ...
      Description: contains a subdivision of the front, body, or back of a text
      Since: P1
      Type: General
      Reason: devoid of any real meaning; should be provided by target technology e.g. HTML
    • div1 ...
      Description: a "first level" div
      Since: P1
      Type: General
      Reason: superfluous; not needed in HTML; should be provided by target language
    • div2 ...
      Description: a "second level" div
      Since: P1
      Type: General
      Reason: superfluous; not needed in HTML; should be provided by target language
    • div3 ...
      Description: a "third level" div
      Since: P1
      Type: General
      Reason: superfluous; not needed in HTML; should be provided by target language
    • div4 ...
      Description: a "fourth level" div
      Since: P1
      Type: General
      Reason: superfluous; not needed in HTML; should be provided by target language
    • div5 ...
      Description: a "fifth level" div
      Since: P1
      Type: General
      Reason: superfluous; not needed in HTML; should be provided by target language
    • div6 ...
      Description: a "sixth level" div
      Since: P1
      Type: General
      Reason: superfluous; not needed in HTML; should be provided by target language
    • div7 ...
      Description: a "seventh level" div
      Since: P1
      Type: General
      Reason: superfluous; not needed in HTML; should be provided by target language
    • divGen ...
      Description: indicates the location at which a textual division generated automatically by a text-processing application is to appear.
      Since: P2
      Type: Programmatic
      Reason: not needed for transcription
    • docAuthor ...
      Description: author of the document
      Since: P2
      Type: General
      Reason: says basically the same thing as author; author already in document metadata
    • docDate ...
      Description: contains the date of a document, as given (usually) on a title page
      Since: P2
      Type: General
      Reason: says basically the same thing as date; date already in document metadata
    • docEdition ...
      Description: contains an edition statement as presented on a title page of a document
      Since: P2
      Type: General
      Reason: says basically the same thing as edition; edition already in document metadata
    • docImprint ...
      Description: contains the imprint statement (place and date of publication, publisher name), as given (usually) at the foot of a title page
      Since: P2
      Type: Literary
    • docTitle ...
      Description: contains a title for any kind of work
      Since: P2
      Type: General
      Reason: says basically the same thing as title; title already in document metadata
    • domain ...
      Description: describes the most important social context in which the text was realized or for which it is intended
      Since: P2
      Type: Linguistic; Metadata
      Reason: linguistic code; use dc:subject
  • edition to extent
    • edition ...
      Description: describes the particularities of one edition of a text
      Since: P1
      Type: Metadata
      Reason: use ead:edition or mods:edition
    • editionStmt ...
      Description: groups information relating to one edition of a text
      Since: P2
      Type: Metadata
      Reason: use ead:editionstmt or mods:edition
    • editor ...
      Description: contains a secondary statement of responsibility for a bibliographic item
      Since: P1
      Type: Metadata
      Reason: use mods:role or ead:author
    • editorialDecl ...
      Description: provides details of editorial principles and practices applied during the encoding of a text
      Since: P2
      Type: Metadata
      Reason: use ead:processinfo or mods:recordInfo
    • education ...
      Description: contains a description of the educational experience of a person
      Since: P2
      Type: Linguistic
      Reason: off topic; linguistic code
    • eg ...
      Description: contains any kind of illustrative example
      Since: P1
      Type: Programmatic
      Reason: only used to document TEI
    • egXML ...
      Description: contains a single well-formed XML fragment demonstrating the use of some XML element or attribute
      Since: P5 1.0.0
      Type: Programmatic
      Reason: only used to document TEI
    • eLeaf ...
      Description: provides explicitly for a leaf of an embedding tree, which may also be encoded with the eTree element
      Since: P2
      Type: Programmatic
      Reason: off topic; for technical documentation only
    • elementRef ...
      Description: points to the specification for some element which is to be included in a schema
      Since: P5 1.7.0
      Type: Programmatic
      Reason: off topic; for technical documentation only
    • elementSpec ...
      Description: documents the structure, content, and purpose of a single element type
      Since: P5 1.0.0
      Type: Programmatic
      Reason: off topic; for technical documentation only
    • email ...
      Description: contains an e-mail address identifying a location to which e-mail messages can be delivered
      Since: P5 1.0.0
      Type: General
    • emph ...
      Description: marks words or phrases which are stressed or emphasized for linguistic or rhetorical effect
      Since: P1
      Type: General
    • encodingDesc ...
      Description: documents the relationship between an electronic text and the source or sources from which it was derived
      Since: P2
      Type: Metadata
      Reason: use ead:daodesc
    • entry ...
      Description: contains a single structured entry in any kind of lexical resource
      Since: P3
      Type: Linguistic
      Reason: off topic
    • entryFree ...
      Description: contains an unstructured entry in any kind of lexical resource
      Since: P3
      Type: Linguistic
      Reason: off topic
    • epigraph ...
      Description: contains a quotation, anonymous or attributed, appearing at the start or end of a section or on a title page
      Since: P1
      Type: Literary
    • epilogue ...
      Description: contains the epilogue to a drama, typically spoken by an actor out of character, possibly in association with a particular performance or venue
      Since: P1
      Type: Literary
    • equipment ...
      Description: provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text
      Since: P2
      Type: Linguistic
      Reason: off topic
    • equiv ...
      Description: specifies a component which is considered equivalent to the parent element, either by co-reference, or by external link
      Since: P2
      Type: Programmatic
      Reason: only for documenting SGML/XML
    • eTree ...
      Description: provides an alternative to tree element for representing ordered rooted tree structures
      Since: P2
      Type: Programmatic
      Reason: off topic; for technical documentation only
    • etym ...
      Description: encloses the etymological information in a dictionary entry
      Since: P3
      Type: Linguistic
      Reason: off topic
    • event ...
      Description: contains data relating to any kind of significant event associated with a person, place, or organization
      Since: P2
      Type: Linguistic
      Reason: off topic
    • ex ...
      Description: contains a sequence of letters added by an editor or transcriber when expanding an abbreviation
      Since: P5 1.0.0
      Type: Variant
      Reason: better computed by software; very fine detail.
    • exemplum ...
      Description: groups an example demonstrating the use of an element along with optional paragraphs of commentary
      Since: P2
      Type: Programmatic
      Reason: only useful for formatting the TEI Guidelines.
    • expan ...
      Description: contains the expansion of an abbreviation
      Since: P2
      Type: Variant
      Reason: Usually coupled with <abbr> and so defines alternatives; better represented externally
    • explicit ...
      Description: contains the explicit of a manuscript item, that is, the closing words of the text proper, exclusive of any rubric or colophon which might follow it
      Since: P5 1.0.0
      Type: Literary
      Reason: highly specialised and probably of limited use
    • extent ...
      Description: describes the approximate size of a text as stored on some carrier medium
      Since: P1
      Type: Metadata
      Reason: use dcterms:extent, mods:extent, ead:extent
  • f to fw
    • f ...
      Description: represents a feature value specification
      Since: P3
      Type: Linguistic
      Reason: off topic
    • facsimile ...
      Description: contains a representation of some written source in the form of a set of images rather than as transcribed or encoded text
      Since: P5 2.3.0
      Type: Literary
      Reason: doesn't qualify text; facsimile is embedded but is not part of the text; complex all-in-one file approach; implement through external markup
    • factuality ...
      Description: describes the extent to which the text may be regarded as imaginative or non-imaginative
      Since: P2
      Type: Linguistic; Metadata
      Reason: linguistic code
    • faith ...
      Description: specifies the faith, religion, or belief set of a person
      Since: P5 1.0.0
      Type: General
      Reason: off topic
    • fDecl ...
      Description: declares a single feature, specifying its name, organization, range of allowed values, and optionally its default value
      Since: P3
      Type: Linguistic
      Reason: linguistic code
    • fDescr ...
      Description: describes in prose what is represented by the feature being declared and its values
      Since: P3
      Type: Linguistic
      Reason: linguistic code
    • figDesc ...
      Description: contains a brief prose description of the appearance or content of a graphic figure
      Since: P3
      Type: General
      Reason: equivalent to alt attribute on <img> in HTML
    • figure ...
      Description: contains a brief prose description of the appearance or content of a graphic figure
      Since: P1
      Type: General
      May contain: 115
      Contained by: 268
    • fileDesc ...
      Description: contains a full bibliographic description of an electronic file
      Since: P2
      Type: Metadata
      May contain: 7
      Contained by: 1
      Reason: use mods:originInfo, ead:filedesc
    • filiation ...
      Description: contains information concerning the manuscript's filiation, i.e. its relationship to other surviving manuscripts of the same text, its protographs, antigraphs and apographs
      Since: P5 1.0.0
      Type: Metadata
      May contain: 207
      Contained by: 1
      Reason: use mods:relatedItem
    • finalRubric ...
      Description: contains the string of words that denotes the end of a text division
      Since: P5 1.0.0
      Type: Literary
      May contain: 157
      Contained by: 2
    • fLib ...
      Description: assembles a library of feature elements
      Since: P3
      Type: Linguistic
      May contain: 1
      Contained by: 266
      Reason: linguistic code
    • floatingText ...
      Description: contains a single text of any kind, whether unitary or composite, which interrupts the text containing it at any point and after which the surrounding text resumes.
      Since: P5 1.0.0
      Type: Literary
      May contain: 47
      Contained by: 120
    • floruit ...
      Description: contains information about a person's period of activity
      Since: P5 2.6.0
      Type: General
      May contain: 157
      Contained by: 2
      Reason: doesn't qualify text; database information
    • foliation ...
      Description: describes the numbering system or systems used to count the leaves or pages in a codex
      Since: P5 2.6.0
      Type: Metadata
      May contain: 208
      Contained by: 1
      Reason: use ead:physdesc
    • foreign ...
      Description: identifies a word or phrase as belonging to some language other than that of the surrounding text
      Since: P1
      Type: General
      May contain: 157
      Contained by: 221
    • forename ...
      Description: contains a forename, given or baptismal name
      Since: P3
      Type: General
      May contain: 157
      Contained by: 223
    • forest ...
      Description: provides for groups of rooted trees
      Since: P1
      Type: Programmatic
      May contain: 3
      Contained by: 50
      Reason: off topic
    • form ...
      Description: groups all the information on the written and spoken forms of one headword
      Since: P1
      Type: Linguistic
      May contain: 215
      Contained by: 10
      Reason: linguistic code
    • formula ...
      Description: contains a mathematical or other formula
      Since: P1
      Type: General
      May contain: 5
      Contained by: 197
    • front ...
      Description: contains any prefatory matter
      Since: P2
      Type: General
      May contain: 71
      Contained by: 3
      Reason: IBM document model; replace with separate file
    • fs ...
      Description: represents a feature structure
      Since: P2
      Type: Linguistic
      May contain: 1
      Contained by: 271
      Reason: programmatic; linguistic code
    • fsConstraints ...
      Description: specifies constraints on the content of valid feature structures
      Since: P3
      Type: Linguistic
      May contain: 2
      Contained by: 1
      Reason: programmatic; linguistic code
    • fsdDecl ...
      Description: provides a feature system declaration comprising one or more feature structure declarations or feature structure declaration links
      Since: P3
      Type: Linguistic
      May contain: 3
      Contained by: 2
      Reason: programmatic; linguistic code
    • fsDecl ...
      Description: declares one type of feature structure
      Since: P3
      Type: Linguistic
      May contain: 3
      Contained by: 1
      Reason: programmatic; linguistic code
    • fsDescr ...
      Description: describes in prose what is represented by the type of feature structure declared in the enclosing fsDecl
      Since: P3
      Type: Linguistic
      May contain: 114
      Contained by: 1
      Reason: programmatic; linguistic code
    • fsdLink ...
      Description: associates the name of a typed feature structure with a feature structure declaration for it
      Since: P5 1.0.0
      Type: Linguistic
      May contain: 0
      Contained by: 1
      Reason: programmatic; linguistic code
    • funder ...
      Description: specifies the name of an individual, institution, or organization responsible for the funding of a project or text
      Since: P5 1.0.0
      Type: Metadata
      May contain: 119
      Contained by: 5
      Reason: use ead:origination, mods:origininfo
    • fvLib ...
      Description: assembles a library of reusable feature value elements
      Since: P3
      Type: Linguistic
      May contain: 11
      Contained by: 259
      Reason: programmatic; linguistic code
    • fw ...
      Description: contains a running head (e.g. a header, footer), catchword, or similar material appearing on the current page
      Since: P3
      Type: Literary
      May contain: 157
      Contained by: 261
  • g to group
    • g ...
      Description: represents a glyph, or a non-standard character
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 0
      Contained by: 28
      Reason: programmatic; should be externally defined; use UniCode
    • gap ...
      Description: indicates a point where material has been omitted in a transcription
      Since: P3
      Type: Literary
      May contain: 4
      Contained by: 1
    • gen ...
      Description: identifies the morphological gender of a lexical item, as given in the dictionary
      Since: P3
      Type: Linguistic
      May contain: 171
      Contained by: 2
      Reason: linguistic code
    • genName ...
      Description: contains a name component used to distinguish otherwise similar names on the basis of the relative ages or generations of the persons
      Since: P3
      Type: General
      May contain: 134
      Contained by: 1
    • geo ...
      Description: contains any expression of a set of geographic coordinates, representing a point, line, or area on the surface of the earth in some notation
      Since: P5 1.0.0
      Type: Geographic
      May contain: 0
      Contained by: 1
      Reason: off topic; use GML
    • geoDecl ...
      Description: documents the notation and the datum used for geographic coordinates expressed as content of the geo element
      Since: P5 1.0.0
      Type: Geographic
      May contain: 134
      Contained by: 1
      Reason: off topic; use GML
    • geoFeat ...
      Description: contains a common noun identifying some geographical feature contained within a geographic name, such as valley, mount etc.
      Since: P5 1.0.0
      Type: Geographic
      May contain: 1
      Contained by: 1
      Reason: off topic; use GML
    • geogName ...
      Description: a name associated with some geographical feature such as Windrush Valley or Mount Sinai
      Since: P3
      Type: Geographic
      May contain: 134
      Contained by: 1
      Reason: off topic; use GML
    • gi ...
      Description: contains the name (generic identifier) of an element
      Since: P2
      Type: Programmatic
      May contain: 0
      Contained by: 1
      Reason: programmatic; only useful for documenting TEI
    • gloss ...
      Description: identifies a phrase or word used to provide a gloss or definition for some other word or phrase
      Since: P1
      Type: Annotation
      May contain: 134
      Contained by: 2
      Reason: annotation; should be defined externally
    • glyph ...
      Description: provides descriptive information about a character glyph
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 12
      Contained by: 1
      Reason: programmatic; should be external
    • glyphName ...
      Description: contains the name of a glyph, expressed following Unicode conventions for character names
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 0
      Contained by: 1
      Reason: programmatic; should be external
    • gram ...
      Description: within an entry in a dictionary or a terminological data file, contains grammatical information relating to a term, word, or form
      Since: P2
      Type: Linguistic
      May contain: 171
      Contained by: 1
      Reason: linguistic code
    • gramGrp ...
      Description: groups morpho-syntactic information about a lexical item
      Since: P3
      Type: Linguistic
      May contain: 185
      Contained by: 3
      Reason: linguistic code
    • graph ...
      Description: encodes a graph, which is a collection of nodes, and arcs which connect the nodes
      Since: P2
      Type: Programmatic
      May contain: 39
      Contained by: 1
      Reason: programmatic; off topic
    • graphic ...
      Description: indicates the location of an inline graphic, illustration, or figure
      Since: P5 1.0.0
      Type: General
      May contain: 0
      Contained by: 2
      Reason: merge with <figure>
    • group ...
      Description: contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit
      Since: P2
      Type: General
      May contain: 52
      Contained by: 3
      Reason: too high-level; use a separate file; has no meaning
  • handDesc to hyphenation
    • handDesc ...
      Description: contains a description of all the different kinds of writing used in a manuscript
      Since: P5 1.0.0
      Type: Metadata
      May contain: 4
      Contained by: 1
      Reason: not part of a "finding aid"; too detailed; should be part of the description of a separate layer; or use ead:physdesc
    • handNote ...
      Description: describes a particular style or hand distinguished within a manuscript
      Since: P5 1.0.0
      Type: Metadata
      May contain: 208
      Contained by: 2
      Reason: not part of a "finding aid"; too detailed; use ead:physfacet
    • handNotes ...
      Description: contains one or more handNote elements documenting the different hands identified within the source texts
      Since: P5 1.0.0
      Type: Metadata
      May contain: 1
      Contained by: 1
      Reason: virtually identical to handDesc, and should be merged with it
    • handShift ...
      Description: marks the beginning of a sequence of text written in a new hand, or the beginning of a scribal stint
      Since: P3
      Type: Literary
      May contain: 0
      Contained by: 194
    • head ...
      Description: contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc.
      Since: P1
      Type: General
      May contain: 197
      Contained by: 44
    • headItem ...
      Description: contains the heading for the item or gloss column in a glossary list or similar structured list
      Since: P3
      Type: General
      May contain: CHECK
      Contained by: 1
    • headLabel ...
      Description: contains the heading for the label or term column in a glossary list or similar structured list
      Since: P3
      Type: General
      May contain: 157
      Contained by: 1
    • height ...
      Description: contains a measurement measured along the axis at right angles to the bottom of the written surface, i.e. parallel to the spine for a codex or book
      Since: P5 1.0.0
      Type: Metadata
      May contain: 1
      Contained by: 220
      Reason: use ead:dimensions; mods:extent etc.
    • heraldry ...
      Description: contains a heraldic formula or phrase, typically found as part of a blazon, coat of arms, etc.
      Since: P3
      Type: Literary
      May contain: 157
      Contained by: 220
    • hi ...
      Description: marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.
      Since: P2
      Type: General
      May contain: 196
      Contained by: 225
      Reason: Too abstract; replace with standard format tags; "rend" attribute also not from controlled vocabulary.
    • history ...
      Description: groups elements describing the full history of a manuscript or manuscript part.
      Since: P5 1.0.0
      Type: Metadata
      May contain: 6
      Contained by: 2
      Reason: use dcterms:provenance; ead:custodialhist etc.
    • hom ...
      Description: groups information relating to one homograph within an entry (homograph)
      Since: P3
      Type: Linguistic
      May contain: 53
      Contained by: 5
      Reason: linguistic code
    • hyph ...
      Description:
      Since: P3
      Type: Linguistic
      May contain: 53
      Contained by: 5
      Reason: linguistic code
    • hyphenation ...
      Description: summarizes the way in which hyphenation in a source text has been treated in an encoded version of it
      Since: P4
      Type: Metadata
      May contain: 2
      Contained by: 1
      Reason: no exact match in standard schemes
  • ident to iType
    • ident ...
      Description: contains an identifier or name for an object of some kind in a formal language
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 0
      Contained by: 221
      Reason: technical requirement; doesn't qualify text
    • idno ...
      Description: supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way
      Since: P1
      Type: Programmatic
      May contain: 2
      Contained by: 236
      Reason: programmatic requirement; doesn't qualify cultural heritage text
    • if ...
      Description: defines a conditional default value for a feature; the condition is specified as a feature structure, and is met if it subsumes the feature structure in the text for which a default value is sought
      Since: P1
      Type: Linguistic
      May contain: 13
      Contained by: 1
      Reason: linguistic code
    • iff ...
      Description: separates the condition from the consequence in a bicond element
      Since: P3
      Type: Linguistic
      May contain: 0
      Contained by: 1
      Reason: linguistic code
    • imprimatur ...
      Description: contains a formal statement authorizing the publication of a work, sometimes required to appear on a title page or its verso
      Since: P2
      Type: Literary
      May contain: 196
      Contained by: 2
    • imprint ...
      Description: groups information relating to the publication or distribution of a bibliographic item
      Since: P1
      Type: Literary
      May contain: 1
      Contained by: 259
    • incident ...
      Description: marks any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication
      Since: P5 1.0.0
      Type: Linguistic
      May contain: 1
      Contained by: 259
      Reason: linguistic code
    • incipit ...
      Description: contains the incipit of a manuscript item, that is the opening words of the text proper, exclusive of any rubric which might precede it
      Since: P5 1.0.0
      Type: Metadata
      May contain: 157
      Contained by: 2
      Reason: copies text into the metadata; off-topic
    • index ...
      Description: marks a location to be indexed for whatever purpose
      Since: P1
      Type: Programmatic
      May contain: 2
      Contained by: 260
      Reason: word-processing code
    • iNode ...
      Description: represents an intermediate (or internal) node of a tree
      Since: P2
      Type: Programmatic
      May contain: 1
      Contained by: 1
      Reason: of topic
    • institution ...
      Description: contains the name of an organization such as a university or library, with which a manuscript is identified, generally its holding institution
      Since: P5 1.0.0
      Type: Metadata
      May contain: 1
      Contained by: 2
      Reason: use ead:corpname or mods:physicalLocation
    • interaction ...
      Description: describes the extent, cardinality and nature of any interaction among those producing and experiencing the text, for example in the form of response or interjection, commentary, etc
      Since: P2
      Type: Linguistic; Metadata
      May contain: 119
      Contained by: 1
      Reason: linguistic code
    • interp ...
      Description: summarizes a specific interpretative annotation which can be linked to a span of text
      Since: P2
      Type: Annotation
      May contain: 5
      Contained by: 260
      Reason: should be external to the text
    • interpGrp ...
      Description: collects together a set of related interpretations which share responsibility or type
      Since: P3
      Type: Annotation
      May contain: 2
      Contained by: 259
      Reason: should be external to the text
    • interpretation ...
      Description: describes the scope of any analytic or interpretive information added to the text in addition to the transcription
      Since: P2
      Type: Metadata
      May contain: 2
      Contained by: 1
      Reason: nonsensical: all markup and transcription is interpretation; use mods:note
    • item ...
      Description: contains one component of a list
      Since: P1
      Type: General
      May contain: 208
      Contained by: 1
    • iType ...
      Description: indicates the inflectional class associated with a lexical item
      Since: P3
      Type: Linguistic
      May contain: 196
      Contained by: 5
      Reason: linguistic code; formerly itype
  • join to joinGrp
    • join ...
      Description: identifies a possibly fragmented segment of text, by pointing at the possibly discontiguous elements which compose it
      Since: P2
      Type: Programmatic
      May contain: 4
      Contained by: 260
      Reason: creates non-computable structures
    • joinGrp ...
      Description: groups a collection of join elements and possibly pointers
      Since: P3
      Type: Programmatic
      May contain: 5
      Contained by: 259
      Reason: creates non-computable structures
  • keywords to kinesic
    • keywords ...
      Description: contains a list of keywords or phrases identifying the topic or nature of a text
      Since: P2
      Type: Metadata
      May contain: 1
      Contained by: 1
      Reason: use dcterms:subject; ead:subject; mods:subject
    • kinesic ...
      Description: marks any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc.
      Since: P2
      Type: Linguistic
      May contain: 1
      Contained by: 259
      Reason: linguistic code
  • l to locusGrp
    • l ...
      Description: contains a single, possibly incomplete, line of verse
      Since: P2
      Type: Literary
      May contain: 196
      Contained by: 53
    • label ...
      Description: contains any label or heading used to identify part of a text, typically but not exclusively in a list or glossary
      Since: P1
      Type: General
      May contain: 157
      Contained by: 142
    • lacunaEnd ...
      Description:indicates the end of a lacuna in a mostly complete textual witness
      Since: P3
      Type: Variant
      May contain: 0
      Contained by: 2
      Reason: creates alternatives; use layers instead
    • lacunaStart ...
      Description: indicates the start of a lacuna in a mostly complete textual witness
      Since: P3
      Type: Variant
      May contain: 0
      Contained by: 2
      Reason: creates alternatives; use layers instead
    • lang ...
      Description: contains the name of a language mentioned in etymological or other linguistic discussion
      Since: P3
      Type: Linguistic
      May contain: 196
      Contained by: 223
      Reason: linguistic code
    • langKnowledge ...
      Description: summarizes the state of a person's linguistic knowledge, either as prose or by a list of langKnown elements
      Since: P5 1.0.0
      Type: Linguistic
      May contain: 3
      Contained by: 2
      Reason: linguistic code; only needed to group <langKnown> elements
    • langKnown ...
      Description: summarizes the state of a person's linguistic competence, i.e., knowledge of a single language
      Since: P2
      Type: Linguistic
      May contain: 119
      Contained by: 1
      Reason: linguistic code; only needed to group <langKnown> elements
    • language ...
      Description: characterizes a single language or sublanguage used within a text
      Since: P2
      Type: Metadata
      May contain: 119
      Contained by: 1
      Reason: use ead:language; dc:language; mods:languageTerm
    • langUsage ...
      Description: describes the languages, sublanguages, registers, dialects, etc. represented within a text
      Since: P2
      Type: Metadata
      May contain: 1
      Contained by: 1
      Reason: use ead:langusage; mods:language
    • layout ...
      Description: describes how text is laid out on the page, including information about any ruling, pricking, or other evidence of page-preparation techniques
      Since: P5 1.0.0
      Type: Metadata
      May contain: 208
      Contained by: 1
      Reason: use ead:physfacet; no exact equivalent; too detailed
    • layoutDesc ...
      Description: collects the set of layout descriptions applicable to a manuscript
      Since: P5 1.0.0
      Type: Metadata
      May contain: 4
      Contained by: 1
      Reason: use ead:physDesc; no exact equivalent; too detailed
    • lb ...
      Description: marks the start of a new (typographic) line in some edition or version of a text
      Since: P2
      Type: Literary
      May contain: 0
      Contained by: 261
      Reason: only needed because XML can't do overlap
    • lbl ...
      Description: contains a label for a form, example, translation, or other piece of information, e.g. abbreviation for, contraction of, literally, approximately, synonyms
      Since: P3
      Type: Linguistic
      May contain: 196
      Contained by: 8
      Reason: linguistic code
    • leaf ...
      Description: encodes the leaves (terminal nodes) of a tree
      Since: P2
      Type: Programmatic
      May contain: 1
      Contained by: 1
      Reason: off topic
    • lem ...
      Description: contains the lemma, or base text, of a textual variation
      Since: P1
      Type: Variant
      May contain: 200
      Contained by: 2
      Reason: creates alternatives; better represented via layers
    • lg ...
      Description: contains one or more verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
      Since: P1
      Type: Literary
      May contain: 68
      Contained by: 105
      Reason: too abstract; should be represented by separate tags for stanza etc
    • licence ...
      Description: contains information about a licence or other legal agreement applicable to the text
      Since: P5 2.0.0
      Type: Metadata
      May contain: 208
      Contained by: 1
      Reason: use mods:accessCondition; dc:license
    • line ...
      Description: contains the transcription of a topographic line in the source document
      Since: P5 2.0.0
      Type: Topographical
      May contain: 73
      Contained by: 3
      Reason: use SVG
    • link ...
      Description: defines an association or hypertextual link among elements or passages, of some type not more precisely specifiable by other elements
      Since: P2
      Type: Programmatic
      May contain: 0
      Contained by: 263
      Reason: must not be used to link text segments; only as way to encode hypertextual links.
    • linkGrp ...
      Description: defines a collection of associations or hypertextual links
      Since: P2
      Type: Programmatic
      May contain: 2
      Contained by: 247
      Reason: creates potentially noncomputable structures; only needed because XML does not support non-hierarchical structures
    • list ...
      Description: contains any sequence of items organized as a list
      Since: P1
      Type: General
      May contain: 61
      Contained by: 124
    • listApp ...
      Description: contains a list of apparatus entries
      Since: P5 2.2.0
      Type: Variant
      May contain: 3
      Contained by: 123
      Reason: Only used for writing a separate apparatus, detached from the text
    • listBibl ...
      Description: contains a list of bibliographic citations of any kind
      Since: P3
      Type: General
      May contain: 13
      Contained by: 139
      Reason: Just use <list>
    • listChange ...
      Description: groups a number of change descriptions associated with either the creation of a source text or the revision of an encoded text
      Since: P5 2.0.0
      Type: Metadata
      May contain: 3
      Contained by: 1
      Reason: not needed: revisionDesc already lists changes; or use ead:revisiondesc
    • listEvent ...
      Description: contains a list of descriptions, each of which provides information about an identifiable event
      Since: P5 2.6.0
      Type: General
      May contain: 6
      Contained by: 125
      Reason: specific list types for each content type are not needed
    • listForest ...
      Description: provides for lists of forests
      Since: P5 2.0.0
      Type: Programmatic
      May contain: 1
      Contained by: 49
      Reason: off topic
    • listNym ...
      Description: contains a list of nyms, that is, standardized names for any thing
      Since: P5 1.0.0
      Type: General
      May contain: 6
      Contained by: 123
      Reason: unnecessary: just use <list>: you don't need a special list element for each kind of content.
    • listOrg ...
      Description: contains a list of elements, each of which provides information about an identifiable organization
      Since: P5 1.0.0
      Type: General
      May contain: 6
      Contained by: 125
      Reason: unnecessary: just use <list>: you don't need a special list element for each kind of content.
    • listPerson ...
      Description: contains a list of descriptions, each of which provides information about an identifiable person or a group of people
      Since: P5 1.0.0
      Type: General
      May contain: 8
      Contained by: 125
      Reason: unnecessary: just use <list>: you don't need a special list element for each kind of content.
    • listPlace ...
      Description: contains a list of places, optionally followed by a list of relationships (other than containment) defined amongst them
      Since: P5 1.0.0
      Type: General
      May contain: 6
      Contained by: 126
      Reason: unnecessary: just use <list>: you don't need a special list element for each kind of content.
    • listPrefixDef ...
      Description: contains a list of definitions of prefixing schemes used in data.pointer values, showing how abbreviated URIs using each scheme may be expanded into full URIs
      Since: P5 2.3.0
      Type: Programmatic
      May contain: 2
      Contained by: 2
      Reason: documentation for XML only.
    • listRef ...
      Description: supplies a list of significant references to places where this element is discussed, in the current document or elsewhere
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 1
      Contained by: 124
      Reason: documentation for XML only.
    • listRelation ...
      Description: provides information about relationships identified amongst people, places, and organizations, either informally as prose or as formally expressed relation links
      Since: P5 2.0.0
      Type: General
      May contain: 3
      Contained by: 6
      Reason: off topic; for programming purposes; doesn't really qualify text
    • listTranspose ...
      Description: supplies a list of transpositions, each of which is indicated at some point in a document typically by means of metamarks
      Since: P5 2.0.0
      Type: Variant
      May contain: 3
      Contained by: 260
      Reason: transpositions should be computed
    • listWit ...
      Description: lists definitions for all the witnesses referred to by a critical apparatus, optionally grouped hierarchically
      Since: P5 1.0.0
      Type: Variant
      May contain: 3
      Contained by: 123
      Reason: formerly witList (P3); should be encoded as version-descriptions
    • locale ...
      Description: contains a brief informal description of the kind of place concerned, for example: a room, a restaurant, a park bench, etc.
      Since: P2
      Type: Linguistic
      May contain: 119
      Contained by: 1
      Reason: off topic
    • localName ...
      Description: contains a locally defined name for some property
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 0
      Contained by: 1
      Reason: off topic
    • location ...
      Description: defines the location of a place as a set of geographical coordinates, in terms of other named geo-political entities, or as an address
      Since: P5 1.0.0
      Type: Geographic
      May contain: 29
      Contained by: 224
      Reason: use GML
    • locus ...
      Description: defines a location within a manuscript or manuscript part, usually as a (possibly discontinuous) sequence of folio references
      Since: P5 1.0.0
      Type: Annotation
      May contain: 1
      Contained by: 223
      Reason: use external markup or annotation for referencing
    • locusGrp ...
      Description: groups a number of locations which together form a distinct but discontinuous item within a manuscript or manuscript part, according to a specific foliation
      Since: P5 1.3.0
      Type: Annotation
      May contain: 1
      Contained by: 222
      Reason: use external markup or annotation for referencing
  • m to musicNotation
    • m ...
      Description: represents a grammatical morpheme
      Since: P3
      Type: Linguistic
      May contain: 47
      Contained by: 193
      Reason: linguistic code
    • macroRef ...
      Description: points to the specification for some pattern which is to be included in a schema
      Since: P5 1.7.0
      Type: Programmatic
      May contain: 0
      Contained by: 5
      Reason: services the needs of XML only
    • macroSpec ...
      Description: documents the function and implementation of a pattern
      Since: P5 1.7.0
      Type: Programmatic
      May contain: 10
      Contained by: 120
      Reason: services the needs of XML only
    • mapping ...
      Description: contains one or more characters which are related to the parent character or glyph in some respect, as specified by the type attribute
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 1
      Contained by: 2
      Reason: off topic; for defining new characters
    • material ...
      Description: contains a word or phrase describing the material of which the object being described is composed
      Since: P5 1.0.0
      Type: Metadata
      May contain: 157
      Contained by: 220
      Reason: use ead:physfacet
    • measure ...
      Description: contains a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name
      Since: P2
      Type: General
      May contain: 157
      Contained by: 223
    • measureGrp ...
      Description: contains a group of dimensional specifications which relate to the same object, for example the height and width of a manuscript page
      Since: P4
      Type: Programmatic
      May contain: 9
      Contained by: 223
      Reason: doesn't qualify text; practically metadata
    • media ...
      Description: indicates the location of any form of external media such as an audio or video clip etc
      Since: P5 2.3.0
      Type: Annotation
      May contain: 1
      Contained by: 198
      Reason: doesn't describe document content
    • meeting ...
      Description: contains the formalized descriptive title for a meeting or conference
      Since: P2
      Type: General
      May contain: 114
      Contained by: 24
      Reason: off topic; too specialised for a general schema
    • memberOf ...
      Description: specifies class membership of the documented element or class
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 1
      Contained by: 1
      Reason: only used for documenting TEI
    • mentioned ...
      Description: marks words or phrases mentioned, not used
      Since: P1
      Type: General
      May contain: 157
      Contained by: 221
    • metamark ...
      Description: contains or describes any kind of graphic or written signal within a document the function of which is to determine how it should be read rather than forming part of the actual content of the document.
      Since: P5 2.0.0
      Type: Literary
      May contain: 208
      Contained by: 259
    • metDecl ...
      Description: documents the notation employed to represent a metrical pattern
      Since: P3
      Type: Metadata
      May contain: 5
      Contained by: 1
      Reason: probably unnecessary: should be built in to markup language for metre
    • metSym ...
      Description: documents the intended significance of a particular character or character sequence within a metrical notation
      Since: P5 1.0.0
      Type: Metadata
      May contain: 119
      Contained by: 1
      Reason: probably unnecessary: should be built in to markup language for metre
    • milestone ...
      Description: marks a boundary point separating any kind of section of a text, typically but not necessarily indicating a point at which some part of a standard reference system changes, where the change is not represented by a structural element
      Since: P1
      Type: General
      May contain: 0
      Contained by: 261
      Reason: doesn't describe text; only needed because XML doesn't do overlap
    • mod ...
      Description: represents any kind of modification identified within a single document
      Since: P5 2.0.0
      Type: Literary
      May contain: 196
      Contained by: 194
      Reason: more or less the same as <subst>; better to use layers; computationally redundant
    • moduleRef ...
      Description: references a module which is to be incorporated into a schema
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 1
      Contained by: 2
      Reason: only used to format XML documentation like TEI
    • moduleSpec ...
      Description: documents the structure, content, and purpose of a single module, i.e. a named and externally visible group of declarations
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 7
      Contained by: 120
      Reason: only used to format XML documentation like TEI
    • monogr ...
      Description: contains bibliographic elements describing an item (e.g. a book or journal) published as an independent item (i.e. as a separate physical object)
      Since: P2
      Type: General
      May contain: 18
      Contained by: 1
      Reason: fossilised survivor from P1/P2; where are the corresponding article, book, chapter, inproceedings, electronic and other types of bibliographic items?
    • mood ...
      Description: contains information about the grammatical mood of verbs
      Since: P5 2.6.0
      Type: Linguistic
      May contain: 196
      Contained by: 5
      Reason: linguistic code
    • move ...
      Description: marks the actual entrance or exit of one or more characters on stage
      Since: P2
      Type: Programmatic; Literary
      May contain: 0
      Contained by: 121
      Reason: doesn't qualify text
    • msContents ...
      Description: describes the intellectual content of a manuscript or manuscript part, either as a series of paragraphs or as a series of structured manuscript items
      Since: P5 1.0.0
      Type: Metadata
      May contain: 7
      Contained by: 2
      Reason: use ead:c; dcterms:hasPart; mods:titleInfo/partNumber etc
    • msDesc ...
      Description: contains a description of a single identifiable manuscript or other text-bearing object
      Since: P5 1.0.0
      Type: Metadata
      May contain: 9
      Contained by: 137
      Reason: use ead:did
    • msIdentifier ...
      Description: contains the information required to identify the manuscript being described
      Since: P5 1.0.0
      Type: Metadata
      May contain: 13
      Contained by: 3
      Reason: use mods:identifier
    • msItem ...
      Description: describes an individual work or item within the intellectual content of a manuscript or manuscript part
      Since: P5 1.0.0
      Type: Metadata
      May contain: 85
      Contained by: 2
      Reason: use ead:did; mods:typeOfResource
    • msItemStruct ...
      Description: contains a structured description for an individual work or item within the intellectual content of a manuscript or manuscript part
      Since: P5 1.0.0
      Type: Metadata
      May contain: 20
      Contained by: 3
      Reason: same as <msItem>
    • msName ...
      Description: contains any form of unstructured alternative name used for a manuscript, such as an ‘ocellus nominum’, or nickname
      Since: P5 1.0.0
      Type: Metadata
      May contain: 1
      Contained by: 1
      Reason: use ead:unittitle; dcterms:lternative; etc.
    • msPart ...
      Description: contains information about an originally distinct manuscript or part of a manuscript, now forming part of a composite manuscript
      Since: P5 1.0.0
      Type: Metadata
      May contain: 10
      Contained by: 2
      Reason: use ead:c; same as msItem and msItemStruct
    • musicNotation ...
      Description: contains description of type of musical notation
      Since: P5 1.0.0
      Type: Metadata
      May contain: 208
      Contained by: 1
      Reason: use ead:odd; no close match in other metadata standards
  • name to nym
    • name ...
      Description: contains a proper noun or noun phrase
      Since: P1
      Type: General
      May contain: 157
      Contained by: 225
    • nameLink ...
      Description: contains a connecting phrase or link used within a name but not regarded as part of it, such as van der or of
      Since: P3
      Type: Linguistic
      May contain: 157
      Contained by: 223
      Reason: linguistic code; no one else regards "van der" as not part of a name
    • namespace ...
      Description: supplies the formal name of the namespace to which the elements documented by its children belong
      Since: P5 1.0.0
      Type: Programmatic
      May contain: 1
      Contained by: 1
      Reason: only useful for documenting XML or TEI
    • nationality ...
      Description: contains an informal description of a person's present or past nationality or citizenship
      Since: P5 1.0.0
      Type: General
      May contain: 157
      Contained by: 2
      Reason: off-topic
    • node ...
      Description: encodes a node, a possibly labeled point in a graph
      Since: P1
      Type: Programmatic
      May contain: 1
      Contained by: 1
      Reason: off-topic
    • normalization ...
      Description: indicates the extent of normalization or regularization of the original source carried out in converting it to electronic form
      Since: P1
      Type: Metadata
      May contain: 2
      Contained by: 1
      Reason: use ead:descrules; mods:physicalDescription:note or mods:transliteration
    • notatedMusic ...
      Description: encodes the presence of music notation in a text
      Since: P5 2.0.0
      Type: General
      May contain: 6
      Contained by: 259
    • note ...
      Description: contains a note or annotation
      Since: P1
      Type: Annotation
      May contain: 208
      Contained by: 277
      Reason: should be external not embedded
    • notesStmt ...
      Description: collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description
      Since: P2
      Type: Metadata
      May contain: 3
      Contained by: 2
      Reason: use ead:notestmt or <note> within various containers in mods
    • num ...
      Description: contains a number, written in any form
      Since: P1
      Type: General
      May contain: 157
      Contained by: 223
    • number ...
      Description: indicates grammatical number associated with a form, as given in a dictionary
      Since: P1
      Type: Linguistic
      May contain: 196
      Contained by: 5
      Reason: linguistic code
    • numeric ...
      Description: represents the value part of a feature-value specification which contains a numeric value or range
      Since: P5 1.0.0
      Type: Linguistic
      May contain: 0
      Contained by: 11
      Reason: linguistic code; almost the same as <num>
    • nym ...
      Description: contains the definition for a canonical name or name component of any kind
      Since: P5 1.0.0
      Type: Linguistic
      May contain: 2
      Contained by: 21
      Reason: linguistic code
  • objectDesc to oVar
    • objectDesc ...
      Description: contains a description of the physical components making up the object which is being described
      Since: P5 1.0.0
      Type: Metadata
      May contain: 4
      Contained by: 1
      Reason: use ead:physdesc
    • objectType ...
      Description: contains a word or phrase describing the type of object being referred to
      Since: P5 1.9.0
      Type: Metadata
      May contain: 157
      Contained by: 220
      Reason: use ead:physfacet
    • occupation ...
      Description: contains an informal description of a person's trade, profession or occupation
      Since: P2
      Type: General
      May contain: 157
      Contained by: 2
      Reason: should be annotation; off topic
    • offset ...
      Description: marks that part of a relative temporal or spatial expression which indicates the direction of the offset between the two place names, dates, or times involved in the expression
      Since: P3
      Type: Programmatic
      May contain: 1
      Contained by: 224
      Reason: doesn't qualify text
    • opener ...
      Description: groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter
      Since: P2
      Type: General
      May contain: 163
      Contained by: 17
    • oRef ...
      Description: in a dictionary example, indicates a reference to the orthographic form(s) of the headword
      Since: P3
      Type: Linguistic
      May contain: 0
      Contained by: 189
      Reason: doesn't qualify text; linguistic code
    • org ...
      Description: provides information about an identifiable organization such as a business, a tribe, or any other grouping of people
      Since: P2
      Type: General
      May contain: 48
      Contained by: 4
    • orgName ...
      Description: contains an organizational name
      Since: P3
      Type: General
      May contain: 157
      Contained by: 225
    • orig ...
      Description: contains a reading which is marked as following the original, rather than being normalized or corrected
      Since: P2
      Type: Variant
      May contain: 185
      Contained by: 195
      Reason: creates alternatives; better represented via layers
    • origDate ...
      Description: contains any form of date, used to identify the date of origin for a manuscript or manuscript part
      Since: P5 1.0.0
      Type: Metadata
      May contain: 157
      Contained by: 220
      Reason: use ead:date etc.
    • origin ...
      Description: contains any descriptive or other information concerning the origin of a manuscript or manuscript part
      Since: P5 1.0.0
      Type: Metadata
      May contain: 208
      Contained by: 1
      Reason: use ead:origination; mods:origininfo
    • originPlace ...
      Description: contains any form of place name, used to identify the place of origin for a manuscript or manuscript part
      Since: P5 1.0.0
      Type: Metadata
      May contain: 157
      Contained by: 220
      Reason: use mods:origininfo:place
    • orth ...
      Description: gives the orthographic form of a dictionary headword
      Since: P3
      Type: Linguistic
      May contain: 196
      Contained by: 5
      Reason: linguistic code
    • oVar ...
      Description: in a dictionary example, indicates a reference to variant orthographic form(s) of the headword
      Since: P3
      Type: Linguistic
      May contain: 2
      Contained by: 188
      Reason: linguistic code
  • p to pVar
    • p ...
      Description: marks paragraphs in prose
      Since: P1
      Type: General
      May contain: 196
      Contained by: 120
    • particDesc ...
      Description: describes the identifiable speakers, voices, or other participants in any kind of text or other persons named or otherwise referred to in a text, edition, or metadata
      Since: P2
      Type: Metadata; Linguistic
      May contain: 7
      Contained by: 1
      Reason: technically metadata; doesn't describe text; linguistic code
    • pause ...
      Description: marks a pause either between or within utterances
      Since: P2
      Type: Linguistic
      May contain: 0
      Contained by: 259
      Reason: linguistic code
    • pb ...
      Description: marks the start of a new page in a paginated document
      Since: P2
      Type: General
      May contain: 0
      Contained by: 261
      Reason: doesn't qualify text; essentially a hack; only needed because XML can't describe overlap; replace with "page" property
    • pc ...
      Description: contains a character or string of characters regarded as constituting a single punctuation mark
      Since: P5 1.4.0
      Type: Linguistic
      May contain: 25
      Contained by: 192
      Reason: linguistic code; punctuation characters and ligatures already encoded in UniCode
    • per ...
      Description: contains an indication of the grammatical person (1st, 2nd, 3rd, etc.) associated with a given inflected form in a dictionary
      Since: P3
      Type: Linguistic
      May contain: 196
      Contained by: 5
      Reason: linguistic code
    • performnce ...
      Description: contains an indication of the grammatical person (1st, 2nd, 3rd, etc.) associated with a given inflected form in a dictionary
      Since: P2
      Type: Literary
      May contain: 111
      Contained by: 2
      Reason: probably unnecessary since this is really an annotation
    • persName ...
      Description: contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc
      Since: P2
      Type: Linguistic
      May contain: 157
      Contained by: 227
      Reason: linguistic code
    • person ...
      Description: provides information about an identifiable individual, for example a participant in a language interaction, or a person referred to in a historical source
      Since: P2
      Type: Linguistic
      May contain: 69
      Contained by: 3
      Reason: not really for historical texts
    • personGrp ...
      Description: describes a group of individuals treated as a single person for analytic purposes
      Since: P3
      Type: Programmatic
      May contain: 26
      Contained by: 3
      Reason: not really for historical texts
    • phr ...
      Description: represents a grammatical phrase
      Since: P3
      Type: Linguistic
      May contain: 191
      Contained by: 191
      Reason: linguistic code
    • physDesc ...
      Description: contains a full physical description of a manuscript or manuscript part
      Since: P3
      Type: Metadata
      May contain: 12
      Contained by: 2
      Reason: use ead:physDesc; mods:physicalDescription
    • place ...
      Description: contains data about a geographic location
      Since: P1
      Type: Geographic
      May contain: 31
      Contained by: 4
      Reason: off topic
    • placeName ...
      Description: contains an absolute or relative place name
      Since: P2
      Type: Geographic
      May contain: 157
      Contained by: 226
      Reason: off topic; pretty similar to <place>
    • population ...
      Description: contains information about the population of a place
      Since: P2
      Type: Geographic
      May contain: 13
      Contained by: 225
      Reason: off topic
    • pos ...
      Description: indicates the part of speech assigned to a dictionary headword such as noun, verb, or adjective
      Since: P1
      Type: Linguistic
      May contain: 185
      Contained by: 6
      Reason: linguistic code
    • postBox ...
      Description: contains a number or other identifier for some postal delivery point other than a street address
      Since: P3
      Type: General
      May contain: 0
      Contained by: 1
    • postCode ...
      Description: contains a numerical or alphanumeric code used as part of a postal address to simplify sorting or delivery of mail
      Since: P3
      Type: General
      May contain: 0
      Contained by: 1
    • postscript ...
      Description: contains a postscript, e.g. to a letter
      Since: P5 1.0.0
      Type: General
      May contain: 99
      Contained by: 20
    • precision ...
      Description: indicates the numerical accuracy or precision associated with some aspect of the text markup
      Since: P5 1.4.0
      Type: Programmatic
      May contain: 4
      Contained by: 267
      Reason: doesn't qualify text
    • pRef ...
      Description: in a dictionary example, indicates a reference to the pronunciation(s) of the headword
      Since: P3
      Type: Linguistic
      May contain: 0
      Contained by: 189
      Reason: dictionary code
    • prefixDef ...
      Description: defines a prefixing scheme used in data.pointer values, showing how abbreviated URIs using the scheme may be expanded into full URIs
      Since: P5 2.3.0
      Type: Programmatic
      May contain: 2
      Contained by: 1
      Reason: programmatic code only services the needs of XML
    • preparedness ...
      Description: describes the extent to which a text may be regarded as prepared or spontaneous
      Since: P2
      Type: Linguistic, Metadata
      May contain: 119
      Contained by: 1
      Reason: really metadata; linguistic code