Tei@DHSI 7 — Ontologies of named entities and linking

Till Grallert

4 Jun 2015

General notes

Names refer to (named) entities. Information describing entities in detail can be kept in ontologies in the <profileDesc> of the TEI header (c.f. our session on metadata). They are then linked to by means of @ref attributes on the names.

What can we say about named entities?

Potentially, quite a lot…

<person xml:id="VM1893">
    <persName xml:lang="ru">Владимир Владимирович Маяковский</persName>
    <persName xml:lang="fr">Wladimir Maïakowski</persName>
    <birth when="1893-07-19">7 July (OS) 1893, 
        <placeName ref="#BGDT" xml:lang="en">
            <settlement>Baghdati<settlement>, <country>Georgia<country>
    <death when="1930-04-14"/>
    <occupation>Poet and playwright</occupation>
    <note>Among the foremost representatives of early-20th century Russian Futurism.</note>

What elements should the TEI provide for such a purposes?

Traits, states, and events

As elsewhere in the TEI, we resolve this question by adding a layer of abstraction. We distinguish three classes of information:

All these elements are members of the att.datable class and thus can have time/dating attributes.


Some typical states for a person


Some typical traits of a person

Some typical traits of a place:

Some of these (e.g. sex) have normalised attributes, but mostly they contain free text descriptions.


For persons, only two specific event elements are defined: <birth> and <death>. Anything else must be defined using the generic <event> element and its @type attribute.

<person xml:id="pers_3">
    <persName xml:lang="ar">
        <forename>نجيب</forename> <forename>يوسف</forename> <surname>عربيلي</surname></persName>
    <persName xml:lang="ar-Latn-EN">
        <forename>Najeeb</forename> <forename xml:lang="en">Joseph</forename> <surname>Arbeely</surname></persName>
        <date when="1861">1861</date>, in <placeName>Damascus</placeName>
        <date when="1904">1904, February</date>, in <placeName>New York</placeName>
    <event when="1878">
        <p>Migration to the <placeName>USA</placeName></p>
    <state from="1885">
        <p>American consul in <placeName>Jerusalem</placeName></p>
    <state notBefore="1886">
        <p>Inspector in the <orgName>Bureau of Immigration</orgName> at the port of <placeName>New York</placeName></p>
    <state from="1892-04-15" xml:lang="en">
        <p>Editor of <orgName>Kawkab America</orgName>.</p>

A place as being defined by its location

The <location> element can contain


<place type="neighbourhood" xml:id="ltg000001">
    <placeName xml:lang="ar-Latn-x-ijmes">Bāb al-Jābiyya</placeName>
    <placeName xml:lang="ar">باب الجابية</placeName>
    <settlement xml:lang="ar" type="city">دمشق الشام</settlement>
    <region xml:lang="ota" type="province" notAfter="1918-10-01">ولاية سورية</region>
        <geo>33.507628, 36.301395</geo>

Places can self-nest

<place type="state">
    <placeName xml:lang="en">Ottoman Empire</placeName>
    <placeName xml:lang="ar">الدولة العثمانية العالية</placeName>
    <place type="province">
        <placeName notAfter="1918-10-01" xml:lang="ota ar">ولاية سورية</placeName>
        <placeName notAfter="1918-10-01" xml:lang="en">Province of Syria</placeName>
        <place type="city">
            <placeName type="city" xml:lang="ar">دمشق الشام</placeName>
            <placeName type="city" xml:lang="en">Damascus</placeName>
            <place type="neighbourhood">
                <placeName xml:lang="ar-Latn-x-ijmes">Bāb al-Jābiyya</placeName>
                <placeName xml:lang="ar">باب الجابية</placeName>
                    <geo>33.507628, 36.301395</geo>

Organizational names

Organizations have names as well. These are any named collection of people regarded as a single unit. An <orgName> can point back to an <org> in the header.

<p>it is debated <date notAfter="1908-10-01">now</date> among ‘<orgName ref="#CUP">Young Turkey</orgName>’ adherents whether it would be right to punish the officials who were led to bribery by the littleness of their pay &amp; its frequent irregularity.</p>

<org xml:id="CUP">
    <!-- Information about the organization -->
    <orgName xml:lang="en">Committee of Union and Progress</orgName>

All entities can be fictional

<place type="imaginary">
        <offset>fifty leagues beyond</offset>
        <placeName>Pillars of 

Personal relationships


<person xml:id="pers_2">
    <persName xml:lang="en"><addName type="title">Dr.</addName> <forename>Abraham</forename> <surname>Arbeely</surname></persName>
    <!-- ... -->
<person xml:id="pers_3">
    <persName xml:lang="en"><addName type="title">Prof.</addName> <forename>Abraham</forename> <forename>Joseph</forename> <surname>Arbeely</surname></persName>
<!-- ... -->
<person xml:id="pers_4">
    <persName xml:lang="en"><forename>Najeeb</forename> <forename>Joseph</forename> <surname>Arbeely</surname></persName>
<!-- ... -->
<relationGrp type="children">
    <relation name="parent" active="#pers_4" passive="#pers_2 #pers_3"/>
<relationGrp type="siblings">
    <realtion name="sibling" mutual="#pers_2 #pers_3"/>


The elements <listNym> and <nym> are used to document the canonical form of a name or name-component.


<nym xml:id="nym-F-737">
    <form xml:lang="ar">شكري</form>
    <form xml:lang="ar-Latn-EN">Shukri</form>
    <form xml:lang="ar-Latn-x-ijmes">Shukrī</form>
    <form xml:lang="tr">Şükrü</form>
<nym xml:id="nym-F-406">
    <form xml:lang="ar">يوسف</form>
    <form xml:lang="ar-Latn-EN">Yusef</form>
    <form xml:lang="ar-Latn-FR">Youssouf</form>
    <form xml:lang="ar-Latn-x-ijmes">Yūsuf</form>
    <form xml:lang="de">Josef</form>
    <form xml:lang="en">Joseph</form>
    <form xml:lang="tr">Yusuf</form>