Tei@DHSI 3 — Schema and customisation

Till Grallert

2 Jun 2015

Schema and customisation: producing valid TEI

The slides are based on those supplied by the various Digital Humanities Summer Schools at the University of Oxford under the Creative Commons Attribution license and have been adopted to the needs of the 2015 Introduction to TEI at DHSI.

Slides were produced using MultiMarkDown, Pandoc, Slidy JS, and the Snippet jQuery Syntax highlighter.

Customising the TEI

We will cover:

Every use of the TEI involves making use of a customisation of the TEI.

Terminology again

What is a module?

Which modules are available?

Module name Chapter of the P5
analysis Simple analytical mechanisms
certainty Certainty and responsibility
core Elements available in ALL TEI documents
corpus Language corpora
dictionaries Dictionaries
drama Performance texts
figure Tables, formulae, and graphics
gaiji Representation of non-standard characters and glyphs
header the TEI header
iso-fs Feature structures
linking Linking, segmentation, and alignment
msdescription Manuscript description
namesdates Names, dates, people, and places
nets Graphs, networks, and trees
spoken Transcription of speech
tagdocs Documentation elements
tei the TEI infrastructure
textcrit Critical apparatus
textstructure Default text structure
transcr Representation of primary sources
verse verse

How do you choose?

Here comes Roma a command line script, with a web frontend, designed to make this process much easier http://www.tei-c.org/Roma/

Roma: design a new schema

Screen shot: select a starting point

Screen shot: select a starting point

Roma: customise

Screen shot: customise metadata

Screen shot: customise metadata

Roma: schema

Screen shot: select a schema language for download

Screen shot: select a schema language for download

Roma: documentation

Screen shot: generate documentation

Screen shot: generate documentation

What did we just do?

We processed a pre-existing ODD file which contained (as well as some discursive prose) the following schema specification:

<schemaSpec ident="tei_bare" start="TEI">
    <moduleRef key="core"/>
    <moduleRef key="tei"/>
    <moduleRef key="header"/>
    <moduleRef key="textstructure"/>
    <elementSpec ident="abbr" mode="delete" module="core"/>
    <elementSpec ident="add" mode="delete" module="core"/>
    <!-- ... -->
    <elementSpec ident="trailer" mode="delete" module="textstructure"/>
    <elementSpec ident="title" mode="change" module="core">
        <attList>
            <attDef ident="level" mode="delete"/>
        </attList>
    </elementSpec>
    <!-- ... -->
</schemaSpec>

We selected four modules, deleted loads of elements, and also deleted an attribute.

Roma provides an interface to the detail

Roma: select modules

Screen shot: select modules

Screen shot: select modules

Roma: edit modules

Screen shot: edit selected modules

Screen shot: edit selected modules

What do we need for our newspaper?

A simple selection of elements, but also

Other constrains are possible–we might want to insist that a <div @type="bill"> contains only <div type="section"> and <div type="article"> and that the latter should be numbered through a @n attribute

The ODD advantage

We can express these constraints in our ODD meta-schema, and then generate a formal schema to enforce them using whichever schema language we like.

Roma: select attributes

Screen shot: select and change attributes for selected elements

Screen shot: select and change attributes for selected elements

Roma: constrain attribute values

Screen shot: limit attributes to a list of values

Screen shot: limit attributes to a list of values

What did we just do?

Our ODD now includes something like this:

<elementSpec ident="div" mode="change" module="textstructure">
    <attList>
        <attDef ident="type" mode="change" usage="req">
            <valList mode="replace" type="closed">
                <valItem ident="section"/>
                <valItem ident="article"/>
                <valItem ident="verse"/>
                <valItem ident="masthead"/>
                <valItem ident="bill"/>
                <valItem ident="letter"/>
                <!-- ... -->
            </valList>
        </attDef>
    </attList>
</elementSpec>

Note that we can also add documentation to the ODD

<valItem ident="verse">
    <gloss>contains (parts of ) a poem</gloss>
</valItem>

Defining a new element

When defining a new element, we need to consider

The TEI class system helps us answer all these questions (except the first).

The TEI class system

TEI attribute classes

att.global: a very important attribute class

All elements are usually members of att.global; this class provides, among others:

Model Classes

Basic model class structure

Simplifying wildly, one may say that the TEI recognises three kinds of element:

There are ‘base model classes’ corresponding with each of these, and also with the following groupings:

And yes, there is a class model.global for elements that can appear anywhere inside a text — at any hierarchic level.

Break

Defining a new element

Roma: Defining a new element

Screen shot: defining a new element

Screen shot: defining a new element

Defining a content model

Roma: Defining a new element 2

Screen shot: defining a new element

Screen shot: defining a new element

What did we just do?

We added a new element specification to our ODD, like this:

<elementSpec ident="something" mode="add" ns="http://www.example.org/ns/nonTEI">
    <desc>contains something division like.</desc>
    <classes>
        <memberOf key="model.divPart"/>
        <memberOf key="att.typed"/>
    </classes>
    <content>
        <rng:ref name="someThing"/> 
        <rng:oneOrMore>
            <rng:ref name="model.pLike"/>
        </rng:oneOrMore>
    </content>
</elementSpec>

Note that this new element is not in the TEI namespace. It belongs to this specific project only!

Other kinds of constraints

Schematron constraints

An element specification can also contain a <constraintSpec> element which contains rules about its content expressed as ISO Schematron constraints

<elementSpec ident="div" mode="change" module="teistructure" xmlns:s="http://purl.oclc.org/dsdl/schematron">
    <constraintSpec ident="div" scheme="isoschematron">
        <constraint>
            <s:assert test="@type='bill' and .//tei:div[@type='article']">prose must include a paragraph</s:assert>
        </constraint>
    </constraintSpec>
</elementSpec>

However… - You can only add such rules by editing your ODD file: Roma doesn’t know about them. - Not all schema languages can implement these constraints.