Till Grallert
4 Jun 2015
Covered in chapter 13 of the TEI P5 guidelines
The slides are based on those supplied by the various Digital Humanities Summer Schools at the University of Oxford under the Creative Commons Attribution license and have been adopted to the needs of the 2015 Introduction to TEI at DHSI.
Slides were produced using MultiMarkDown, Pandoc, Slidy JS, and the Snippet jQuery Syntax highlighter.
We are going to look at names of things first. Instances of names are distinct from the entities which they reference. One entity (person, place, organisation) might be known by many names.
TEI provides several ways of marking up names and nominal expressions:
<rs>
(“referring string”): any phrase which refers to a person or place, e.g. ‘the girl you mentioned’, ‘my husband’…<name>
: any lexical item recognized as a proper name e.g. ‘Siegfried Sassoon’ , ‘Calais’, ‘John Doe’ …<persName>
, <placeName>
, <orgName>
: ‘syntactic sugar’ for <name type="person">
etc.<surname>
, <forename>
, <geogName>
, <geogFeat>
etc.Recognising the need to distinguish clearly the encoding of references from the encoding of referenced entities (occurrences in the real world) themselves, the TEI provides provides:
<person>
corresponding with <persName>
<place>
corresponding with <placeName>
<org>
corresponding with <orgName>
<relation>
, <event>
and othersReference is a fundamental semiotic concept
Every element which is a member of the att.naming
class inherits two attributes from the att.canonical
class:
@key
: provides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.@ref
: provides an explicit means of locating a full definition for the entity being named by means of one or more URIs.Note: Arguably, @key
is redundant, since @ref
is defined as anyURI, this can point from the name instance to the @xml:id
of metadata about the entity, prefixing it with a ‘#’ if in the same file.
@role
: may be used to specify further information about the entity referenced by this name, for example the occupation of a person, or the status of a place.@nymRef
: provides a means of locating the canonical form (<nym>
) of the names associated with the object named by the element bearing it.Note: @nymRef
is particularly important for multi-lingual examples:
<persName xml:lang="ar">
<forename nymRef="#nym1">شكري</forename>
<addName type="title" nymRef="#nym2">باشا</addName>
</persName>
<persName xml:lang="ota-Latn-x-ijmes">
<forename nymRef="#nym1">Şükrü</forename>
<addName type="title" nymRef="#nym2">Paşa</addName>
</persName>
</persName>
<persName xml:lang="ar-Latn-EN">
<forename nymRef="#nym1">Shukri</forename>
<addName type="title" nymRef="#nym2">Pasha</addName>
</persName>
<p>... <name ref="#jsbach" type="person">Johann Sebastian Bach</name> the German composer was born in 1685... </p>
or
<p>The <orgName ref="entities:otc">Oriental Theatre Company</orgName> numbering 54 people, and under the direction of <persName ref="#pers_3">Mr. Butros Tanfous</persName> arrived this week at the <orgName ref="#org_usib">U.S. Immigration Bureau</orgName>. Several Oriental specialists from various parts of Turkey have been secured in order to give the American public a correct idea of the customs and manners of the people in different parts of the Empire.</p>
Even within a single language, in a single document, there may be many ways of referencing the same person:
<persName>Leslie Gunston</persName>.... <persName>Leslie</persName> .... <. rs>Wilfred's cousin</rs>
The @ref
can be used simply to combine all references to a specified person:
<persName ref="#LG">Leslie Gunston</persName>....
<persName ref="#LG">Leslie</persName> ....
<rs ref="#LG">Wilfred's cousin</rs>
<!-- ... elsewhere -->
<person xml:id="LG">
<persName>Leslie Gunston</persName>
<!-- everything we want to say about Leslie -->
</person>
<s>Jean likes <name ref="#NN123">Nancy</name></s>
Using a more precise element (<persName>
or <placeName>
) is one way of resolving the ambiguity; another is to follow the pointer:
<person xml:id="NN123">
<persName>
<forename>Nancy</forename>
<surname>Ide</surname>
</persName>
<!-- ... -->
</person>
or:
<place xml:id="N123">
<placeName notBefore="1400">Nancy</placeName>
<placeName notAfter="0056">Nantium</placeName>
<!-- ... -->
</place>
<persName>
elements<person xml:id="pers_2">
<persName xml:lang="ar">
<addName type="title" nymRef="#nym1">الدكتور</addName>
<forename nymRef="#nym2">ابراهيم</forename>
<surname nymRef="#nym3">عربيلي</surname></persName>
<persName xml:lang="en">
<addName type="title" nymRef="#nym1">Dr.</addName>
<forename nymRef="#nym2">Abraham</forename>
<surname nymRef="#nym3">Arbeely</surname></persName>
</person>
<person xml:id="pers_3">
<persName xml:lang="ar">
<forename nymRef="#nym4">نجيب</forename>
<forename nymRef="#nym5">يوسف</forename>
<surname nymRef="#nym3">عربيلي</surname></persName>
<persName xml:lang="en">
<forename nymRef="#nym4">Najeeb</forename>
<forename nymRef="#nym5">Joseph</forename>
<surname nymRef="#nym3">Arbeely</surname></persName>
</person>
Not to mention: <roleName>
(e.g. ‘Emperor’), <genName>
(eg ‘the Elder’) <addName>
(e.g. ‘Hammer of the Scots’), <nameLink>
a link between components (e.g. ‘van’) etc. all of which can carry @type
attributes
<persName>
works well for Western names, but Arabic or Ottoman?The canonical scheme of <surname>
and <forename>
is insufficient to markup the components of personal names in pre-modern and/or non-Western contexts: How should we mark up the following names?
Soulah and Hassoun 2012 propose to use available elements <surname>
, <forename>
, and <addName>
with a controlled vocabulary of @type
and @subtype
attributes.
<surname>
: to encode the laqab evoking a real or assigned quality<forename>
: for the ism<addName>
with @type
I suggest to add the following values to the @type
attribute of <addName>
<persName xml:lang="ar"> جزائري زاده الامير علي باشا ابن عبد القادر افندي الحسني</persName>
Could be marked up as:
<persName xml:lang="ar">
<addName type="nisbah">جزائري</addName>
<addName type="honorific" xml:lang="ota">زاده</addName>
<addName type="title">الامير</addName>
<forename>علي</forename>
<addName type="title" xml:lang="ota">باشا</addName>
<addName type="patronym">ابن
<forename>عبد القادر</forename>
<addName type="title" xml:lang="ota">افندي</addName>
</addName>
<surname type="laqab">الحسني</surname>
</persName>
<placeName>
(names can be made up of other names)<geogName>
a name associated with some geographical feature such as a mountain or river<geogFeat>
a term for some particular kind of geographical feature e.g. ‘Mount’, ‘Lake’For example:
<placeName>
<geogFeat>Mont</geogFeat>
<geogName>Blanc</geogName>
</placeName>
<bloc>
: name of a geo-political unit consisting of two or more nation states or countries.<country>
: name of a geo-political unit, such as a nation, country, colony, or commonwealth, larger than or administratively superior to a region and smaller than a bloc.<region>
: name of an administrative unit such as a state, province, or county, larger than a settlement, but smaller than a country.<settlement>
: name of a settlement such as a city, town, or village identified as a single geo-political or administrative unit.<district>
: contains the name of any kind of subdivision of a settlement, such as a parish, ward, or other administrative or geographic unit.<date>
elementTemporal information can be encoded with:
<date>
: contains a date in any format.<time>
contains a phrase defining a time of day in any format. Example:
<div type="article" xml:lang="en">
<head xml:lang="ar">المرمح الحميدي</head>
<head xml:lang="en">The Hamidieh Hipodrome</head>
<ab rend="center">---</ab>
<p>At the <orgName>U.S. Immigration Bureau</orgName> the steamer <orgName>Cyntiana</orgName> whitch sailed from <placeName>Beyrouth</placeName> on the <date when="1893-03-29">29th of March</date> arrived <date when="1893-04-24">Monday evening, April the 24th, <time>at 7 P.M.</time></date> She brought over 12 first-class passengers and 262 steerage including the horsemen, performers and attendants of the <orgName>Hamidieh Hipodrome Company</orgName> to which we made reference in out last issue, promissing to write a special article on its arrival.</p>
</div>
All the elements above are ‘datable’ and so can be associated with a more or less exact date or date range using any combination of the following attributes (class att.datable
):
@when
: supplies the value of a date or time in a standard form@notBefore
: specifies the earliest possible date for the event in standard form@notAfter
: specifies the latest possible date for the event in standard form@from
: indicates the starting point of the period in standard form@to
: indicates the ending point of the period in standard formSimilar to the conceptualisation of personal names, current dating standards favour the contemporary Western model–i.e. without further specification all dated attributes refer to the Gregorian calendar.
All other calendars–in our case this means hijrī, mālī, and rūmī–should be declared and documented using the <calendarDesc>
in the <profileDesc>
in the TEI header. They can then be referenced through:
@calendar
: indicates the system or calendar to which the date represented by the content of this element belongs.@datingMethod
: supplies a pointer to a <calendar>
element or other means of interpreting the values of the custom dating attributes:
att.datable.custom
: @when-custom
, @notBefore-custom
etc.<calendar xml:id="cal_islamic">
<p>Islamic <hi>hijrī</hi> calendar: lunar calendar beginning the Year with 1 Muḥarram. Dates differ between locations as the beginning of the month is based on sightings of the new moon.</p>
<p>E.g. <date calendar="#cal_islamic" datingMethod="#cal_islamic" when="1841-05-23" when-custom="1257-04-01">1 Rab II 1257, Sunday</date>, <date calendar="#cal_islamic" datingMethod="#cal_islamic" when="1908-03-05" when-custom="1326-02-01">1 Ṣaf 1326, Thursday</date>.</p>
</calendar>
Note: The official XPath specifications have a bug that prevents the computation of Islamic hijrī dates. To remedy this and other issues, I wrote a number of XSLT stylesheets for converting dates between the four calendars in use in the Ottoman Empire, which can be found on GitHub (https://github.com/tillgrallert/xslt-calendar-conversion).
<calendar xml:id="cal_julian">
<p>Reformed Julian calendar beginning the Year with 1 January. In the Ottoman context usually referred to as <hi>rūmī</hi>. Arabic newspapers usually labelled this calendar as <hi>sharqī</hi>.</p>
<p>All solar calendars add an intercalated 366th day every fourth (and, in the case of Gregorian and rūmī calendars, even-numbered) year at the end of February (the last day of the old Julian calendar). The Gregorian calendar suppresses this rule in centesimal years that cannot be divided by 400. This difference creates a growing offset between Gregorian and Julian calendars: while 1900 R was a leap year, 1900 was not, which in turn caused the difference between the Gregorian calendar, on the one hand, and the <hi>mālī</hi> and <hi>rūmī</hi> calendars, on the other, to grow from 12 to 13 days from 29 Shubāṭ (February) 1900 R / 1315 M (13 March 1900) onwards.</p>
<p>E.g. <date calendar="#cal_julian" datingMethod="#cal_julian" when="1841-05-23" when-custom="1841-05-11">11 Ayyār 1841, Sunday</date>, <date calendar="#cal_julian" datingMethod="#cal_julian" when="1908-03-05" when-custom="1908-02-21">21 Shub 1908, Thursday</date>.</p>
</calendar>
<calendar xml:id="cal_ottomanfiscal">
<p>Ottoman fiscal calendar: a lunosolar calendar. It is based on the Old Julian calendar beginning the Year with 1 March. Introduced as fiscal calendar in 1676 and in the Ottoman context usually referred to as <hi>mālī</hi> and sometimes, confusingly, also as <hi>rūmī</hi>. Every 33 lunar years, a <hi>hijrī</hi> year would complete within a single solar <hi>mālī</hi> year. In this case the counting of the <hi>mālī</hi> years skipped a year to catch up with the faster <hi>hijrī</hi> calendar. Due to a printing error in the coupon booklets for the consolidated debt repayment program for 1872 (1288 M instead of 1289 M), synchronisation of <hi>mālī</hi> and <hi>hijrī</hi> years was henceforth abolished. As <hi>mālī</hi> years began with 1 March, <hi>mālī</hi> leap years preceded their <hi>rūmī</hi> and Gregorian counterpart (the leap year 1315 M commenced on 13 March 1899).</p>
<p>E.g. <date calendar="#cal_ottomanfiscal" datingMethod="#cal_ottomanfiscal" when="1841-05-23" when-custom="1257-03-11">11 Māyis 1257, Sunday</date>, <date calendar="#cal_ottomanfiscal" datingMethod="#cal_ottomanfiscal" when="1908-03-05" when-custom="1323-12-21">21 Shub 1323, Thursday</date>.</p>
</calendar>
Now let’s do an exercise where we markup entities in the newspaper texts using <persName>
, <placeName>
, <orgName>
, and <date>
with their various attributes.