Till Grallert
3 Jun 2015
The slides are based on those supplied by the various Digital Humanities Summer Schools at the University of Oxford under the Creative Commons Attribution license and have been adopted to the needs of the 2015 Introduction to TEI at DHSI.
Slides were produced using MultiMarkdown, Pandoc, Slidy JS, and the Snippet jQuery Syntax highlighter.
The XSLT language is
It was designed to generate XSL FO, but now widely used to generate HTML.
Take this:
<persName>
<forename>Milo</forename>
<surname>Casagrande</surname>
</persName>
<persName>
<forename>Corey</forename>
<surname>Burger</surname>
</persName>
<persName>
<forename>Naaman</forename>
<surname>Campbell</surname>
</persName>
and make this:
<item n="1">
<name>Burger</name>
</item>
<item n="2">
<name>Campbell</name>
</item>
<item n="3">
<name>Casagrande</name>
</item>
Take this:
<div n="34" type="recipe">
<head>Pasta for beginners</head>
<list>
<item>Pasta</item>
<item>Grated cheese</item>
</list>
<p>Cook the pasta and mix with the cheese</p>
</div>
and make this:
<html>
<h1>34: Pasta for beginners</h1>
<p>Ingredients: Pasta Grated cheese</p>
<p>Cook the pasta and mix with the cheese</p>
</html>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:template match="div">
<html>
<h1>
<xsl:value-of select="@n"/>:
<xsl:value-of select="head"/></h1>
<p>Ingredients:
<xsl:apply-templates select="list/item"/></p>
<p>
<xsl:value-of select="p"/>
</p>
</html>
</xsl:template>
</xsl:stylesheet>
Note: the namespace declaration linking ‘xsl:’ to ‘http://www.w3.org/1999/XSL/Transform’
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:template match="div">
<!-- .... do something with div elements....-->
</xsl:template>
<xsl:template match="p">
<!-- .... do something with p elements....-->
</xsl:template>
</xsl:stylesheet>
div
and p
are XPath expressions, which specify which bit of the document is matched by the template.<xsl:apply-templates select="XX"/>
looks for templates which match element “XX”; <xsl:value-of select="XX"/>
simply gets any text from that elementOur examples and exercises all start with two important attributes on <stylesheet>
:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.tei-c.org/ns/1.0"
version="2.0">
This indicates that
<text>
<front>
<div>
<p>Material up front</p>
</div>
</front>
<body>
<div>
<head>Introduction</head>
<p rend="it">Some sane words</p>
<p>Rather more surprising words</p>
</div>
</body>
<back>
<div>
<p>Material in the back</p>
</div>
</back>
</text>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:template match="/">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="TEI">
<xsl:apply-templates select="text"/>
</xsl:template>
<xsl:template match="text">
<h1>FRONT MATTER</h1>
<xsl:apply-templates select="front"/>
<h1>BODY MATTER</h1>
<xsl:apply-templates select="body"/>
</xsl:template>
</xsl:stylesheet>
Templates for paragraphs and headings
<xsl:template match="p">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="div">
<h2>
<xsl:value-of select="head"/>
</h2>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="div/head"/>
Notice how we avoid getting the heading text twice. Why did we need to qualify it to deal with just <head>
inside <div>
?
The @select
attribute can point to any part of the document. Using XPath expressions, we can find:
expression | meaning |
---|---|
/ | the root of document (outside the root element) |
* | any element |
text() | only the text content of a node |
name | an element called name |
@name | an attribute called name |
Example of a complete path in <value-of>
:
<xsl:value-of select="/TEI/teiHeader/fileDesc/titleStmt/title"/>
XPath is the basis of most other XML querying and transformation languages.
<body n="anthology">
<div type="poem">
<head>The SICK ROSE </head>
<lg type="stanza">
<l n="1">O Rose thou art sick.</l>
<l n="2">The invisible worm,</l>
<l n="3">That flies in the night </l>
<l n="4">In the howling storm:</l>
</lg>
<lg type="stanza">
<l n="5">Has found out thy bed </l>
<l n="6">Of crimson joy:</l>
<l n="7">And his dark secret love </l>
<l n="8">Does thy life destroy.</l>
</lg>
</div>
</body>
XPathExercise 01
XPathExercise 02
XPathExercise 03
XPathExercise 04
XPathExercise 05
XPathExercise 06
XPathExercise 07
XPathExercise 08
XPathExercise 09
XPathExercise 10
XPathExercise 11
XPathExercise 12
XPathExercise 13
XPathExercise 14
XPathExercise 15
XPathExercise 16
XPathExercise 17
XPathExercise 18
XPathExercise 19
XPathExercise 20
XPathExercise 21
XPathExercise 22
XPathExercise 23
XPathExercise 24
XPathExercise 25
XPathExercise 26
/div/lg[1]/l
l/../../head
axisname::nodetest[predicate]
child::div[contains(head, 'ROSE')]
self::
Contains the current nodeattribute::
Contains all attributes of the current nodeparent::
Contains the parent of the current nodeancestor::
Contains all ancestors (parent, grandparent, etc.) of the current nodeancestor-or-self::
Contains the current node plus all its ancestors (parent, grandparent, etc.)child::
Contains all children of the current nodedescendant::
Contains all descendants (children, grandchildren, etc.) of the current nodedescendant-or-self::
Contains the current node plus all its descendants (children, grandchildren, etc.)following::
Contains everything in the document after the closing tag of the current nodefollowing-sibling::
Contains all siblings after the current nodepreceding::
Contains everything in the document that is before the starting tag of the current nodepreceding-sibling::
Contains all siblings before the current nodeancestor::lg
= all <lg>
ancestorsancestor-or-self::div
= all <div>
ancestors or currentattribute::n
= n
attribute of current nodechild::l
= <l>
elements directly under current nodedescendant::l
= <l>
elements anywhere under current nodedescendant-or-self::div
= all <div>
children or currentfollowing-sibling::l[1]
= next <l>
element at this levelpreceding-sibling::l[1]
= previous <l>
element at this levelself::head
= current <head>
elementchild::lg[attribute::type='stanza']
child::l[@n='4']
child::div[position()=3]
child::div[4]
child::l[last()]
child::lg[last()-1]
child::
, so lg
is short for child::lg
@
is the same as attribute::
, so @type
is short for attribute::type
.
is the same as self::
, so ./head
is short for self::node()/child::head
..
is the same as parent::
, so ../lg
is short for parent::node()/child::lg
//
is the same as descendant-or-self::
, so div//l
is short for child::div/descendant-or-self::node()/child::l
Compare
<xsl:template match="head"> .... </xsl:template>
with
<xsl:template match="div/head"> ... </xsl:template>
<xsl:template match="figure/head"> ....</xsl:template>
It is possible for it to be ambiguous which template is to be used:
<xsl:template match="person/name">... </xsl:template>
<xsl:template match="name">... </xsl:template>
Which template is used when the processor meets a <name>
element?
There is a @priority
attribute on <template>
; the higher the value, the more inclined the XSLT engine is to use it:
<xsl:template match="name" priority="1">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="person/name" priority="2">
A name
</xsl:template>
The normal rule is that the most specific template is applied.
<xsl:template match="*">
<!-- ... -->
</xsl:template>
<xsl:template match="tei:*">
<!-- ... -->
</xsl:template>
<xsl:template match="p">
<!-- ... -->
</xsl:template>
<xsl:template match="div/p">
<!-- ... -->
</xsl:template>
<xsl:template match="div/p/@n">
<!-- ... -->
</xsl:template>
XSLT stylesheets can be characterized as being of two types:
<xsl:apply-templates>
and the overall result is assembled from bits in each template. It is sometimes hard to visualize the final design. Common for data-oriented processing where the structure is fixed./
) with the main structure of the output, and specific <xsl:for-each>
or <xsl:value-of>
commands to grab what is needed for each part. The templates tend to get large and unwieldy. Common for document-oriented processing where the input document structure varies.How can we turn this:
<ref target="http://www.oucs.ox.ac.uk/">OUCS</ref>
into that:
<a href="http://www.oucs.ox.ac.uk/"/>
if the following does not work:
<xsl:template match="ref">
<a href="@target">
<xsl:apply-templates/>
</a>
</xsl:template>
as it will produce:
<a href="@target">OUCS</ref>
Instead we have two options to give the @href attribute whatever value the @target attribute has
Use {}
to indicate that the expression must be evaluated:
<xsl:template match="ref">
<a href="{@target}">
<xsl:apply-templates/>
</a>
</xsl:template>
Use <xsl:attribute>
<xsl:template match="ref">
<a>
<xsl:attribute name="href" select="@target"/>
<xsl:apply-templates/>
</a>
</xsl:template>
If we want to avoid lots of templates, we can do in-line looping over a set of elements. For example:
<xsl:template match="listPerson">
<ul>
<xsl:for-each select="person">
<li>
<xsl:value-of select="persName"/>
</li>
</xsl:for-each>
</ul>
</xsl:template>
compare to:
<xsl:template match="listPerson">
<ul>
<xsl:apply-templates select="person"/>
</ul>
</xsl:template>
<xsl:template match="person">
<li>
<xsl:value-of select="persName"/>
</li>
</xsl:template>
We can make code conditional on a test being passed. The @test can use any XPath facilities:
<xsl:template match="person">
<xsl:if test="@sex='2'">
<li>
<xsl:value-of select="persName"/>
</li>
</xsl:if>
</xsl:template>
compare to:
<xsl:template match="person[@sex='1']">
<li>
<xsl:value-of select="persName"/>
</li>
</xsl:template>
<xsl:template match="person"/>
We can make a multi-value choice conditional on what we find in the text:
<xsl:template match="person">
<xsl:apply-templates/>
<xsl:choose>
<xsl:when test="@sex='1'">(male) </xsl:when>
<xsl:when test="@sex='2'">(female) </xsl:when>
<xsl:when test="not(@sex)">(no sex specified) </xsl:when>
<xsl:otherwise>(unknown sex)</xsl:otherwise>
</xsl:choose>
</xsl:template>
Now you can
And we are going to put this knowledge to use on our XML files (exited gasps as you are about to program!)