Till Grallert

Introduction to Plain Text Workflows and Sustainable Publishing

2017-02-22T16:13:32+00:00

DRAFT

Disclaimer 1: many of the ideas have been inspired by the work of Alex Gil (@elotroalex) and Denis Tenen (@dennistenen)¹ and discussions with the two of them and others at DHI Beirut, THATCamp Beirut, and DHSI.

Disclaimer 2: I am teaching short workshops on some of the ideas outlined in this post at Digital Humanities Institute - Beirut on 10 March 2017 and at DH Abu Dhabi on 10 April 2017. Basic slides are available here and here.

Intro

In the world of (academic) publishing, large aggregators and indexers have turned into and acquired publishing presses and generate obscene profits by charging the public (every tax-payer worldwide) multiple times over. First by charging the predominantly publicly-funded academic for publishing the results of her publicly-funded research and by enforcing a culture of pro bono labour among academic reviewers and editors; second by selling this content to equally predominantly publicly-funded libraries, which then increasingly demand access fees from members of the public, who want to access their collections; and third by offloading the cost of long-term preservation to, again, publicly-funded institutions. This system not only created a hierarchy of academics and institutions in the relatively well-off “West”—two classes divided by their ability to pay for being published and accessing publications (their own and others). It also increasingly prevents anybody outside western academia from accessing cutting-edge research and participating in intellectual discourse.

One of the opportunities afforded by the digital humanities and the stated goal of this endeavour is to remove the middlemen—be they technical or entrepreneurial—between authors, readers, and the library-cum-archive. There are two main obstacles to this aim:

copyright laws and
the alienation of us, authors and academics, from the means of production.

We will not be able to change copyright legislation and the vested business interests in sustaining and expanding the regime of profit-generating copy and distribution rights in the foreseeable future, but we can all provide our knowledge under a creative commons licence. In order to do so, we need to re-claim the means of production. We argue that by doing so the main argument for restrictive copyright—namely, the provision of allegedly expensive services such as quality control and metadata curation—collapses. By the time of writing, printing costs and the global distribution of heavy and voluminous books are already negligible as the main avenue of scholarly publication, the journal, has already moved to digital online publication.

Principles

The main principles in our effort to (re)claim the means of production are: accessibility, simplicity, sustainability, and credibility. They shall pertain both to the intellectual endeavour and to the tools employed.

Accessibility. Accessible means first of all free and open to use and re-purpose for all that have the technical ability to do. Therefore we will have to forfeit all proprietary software and formats. From the imperative of openness and accessibility derive the second and third principles:
Simplicity: apart from the necessity to be able read and write in a specific human language, there should be as few tools, hardware and software requirements as possible. Ideally everything should work on ten-year old hardware and nothing but the software packaged with the operating system—or even better: the core technologies should work without a computer.
Sustainability: only simple systems adhering to widely accepted standards can be sustained with minimal / reasonable effort.
Credibility: Credibility is at the core of scholarly production. In addition to transparency as to the sources, methodology, and tools used, authorship needs to be ascertained and acknowledged as the main tool of scholarly quality control.

Tools / formats

1. Writing

Looking at the most broadly-employed software in academic contexts and beyond, Microsoft’s Word, a piece of bloated and expensive proprietary software, what are the functions we need to replace?

1. composition / authoring

To avoid overtly complex software and proprietary formats, form and content must be separated. Structural / semantic and representational information as well as metadata must be embedded in the text itself in order to be inseparable. We suggest using plain text with rudimentary markup following the conventions of MarkDown and a short block of metadata written in Yaml as the format of choice.

Format: plain text. At their core, all files are simple strings of letters. In the case of plain text files (TXT), this string of letters happens to be human readable. Plain text has been with us since the early days of computing and TXT files could be viewed and edited with 1980s hard- and software. We can therefore assume that this basic format will remain accessible for the years to come.
Syntax: MarkDown and its derivatives provide a simple but formal way of providing structural (headings, lists, notes, block quotes etc.) and formating (italics, bold) information that is both human and machine readable. Think of it as the old way of underlining a line of text on a type-writer to indicate a heading.²
Metadata: Yaml (Yaml ain’t markup language) provides a simple way of including structured metadata (data about data) at the beginning of a text file recording information on authors, title, date etc..
Tools: All major operating systems come with basic text editors sufficient for reading and editing plain text files (“Notepad” on Windows, “TextEdit” on Mac{» comment on Linux«}). However, some additional features, such as syntax highlighting and basic customisation of the writing interface (you have to look at it for substantial periods of time), will significantly enhance the writing experience.
- Sublime Text (Windows, Mac, Linux)
- NotePad++ (Windows only)

2. ascertaining authorship, version control, and archiving

Writing is a process and subject to change. We need to be able to try out different structures and formulations and, more often than one would like to, we discover that yesterday’s deletions would have been worth keeping. Not to mention an external editor or collaborators that quickly make any approach involving ever-longer file names futile (we all have our folders full of text.docx, text-new-version.docx, text-new-version-2004-01-01.docx, text-new-version-2004-01-01-comments-by-tg-2.docx etc.).

git: git is an open version control system (vcs) that works with any file type. It traces every change with clear information on author and a timestamp.
GitHub: GitHub is probably the most popular distributed version control system (dvcs) and code-sharing platform based on git.³ While GitHub is a commercial company, it offers free accounts and unlimited public repositories to everyone and free private repositories to academics. Once a change has been committed to a GitHub repository, authors have a public proof of their authorship with a unique identifier that can be referenced. In addition, the repository forms a redundant online back-up of your work.
- ssh (secure shell): in order to communicate with a server without a local client software, some knowledge of ssh is necessary. To avoid this, one can use any of the numerous GitHub clients.⁴
Zenodo: Zenodo is an open science platform developed and operated by CERN. Some of its many features are the provision of DOIs (digital object identifiers) and long-term storage thus providing stable links and protection against link-rot. It also hooks into GitHub.

3. collaboration / external review

git / GitHub: git allows for branching and forking of text. All changes / suggestions can be reviewed and the author can decide whether to accept or reject them.

{»We need additional collaborative tools«}

prose.io: author text on GitHub online.

2. Publishing

1. Generate an accessible representation of your text

While it would be absolutely sufficient to publish / distribute the plain-text files by means of a USB key, a graphical user interface (GUI) that translates the structural information into a formatted and aesthetically pleasing layout is often {==advisable==}{»better wording?«}—be they a printed paper copy or a website:

Pandoc: Pandoc is a tool for converting plain text and markdown into multiple (hence pan) target formats (thus doc), such as, but not limited to HTML, DOCX, and PDF. Pandoc, which is under active development by John MacFarlane, a professor of philosophy at UC Berkeley, also supports the formatting of references using BibTeX (another plain text format for storing structured bibliographic information) and CSL (citation style language).
HTML and CSS: HTML and CSS are well-established standards to separate form and content maintained by the World Wide Web Consortium (W3C). While the hyper text markup language (HTML) carries the content of our text as well as all the structural and formating information and the metadata, cascading stylesheets (CSS) provide the actual layout. This combination allows to provide different layouts for different contexts and devices using the same content.
Metadata: HTML5 supports semantic tags and machine-readable metadata following various standards, such as those provided by Dublin Core (DC) or schema.org. If one includes Dublin Core metadata in the head of HTML files, aggregators, search engines, and reference managers can find and extract the structured information on author, title, publication date, keywords, etc.

2. Think about and provide a licence

{»mention copyright«}

A licence is formal agreement that specifies the rights and duties of both the licensor (e.g. us as authors) and the licencee (e.g. us as readers). Its most important purpose within our discussion is to assure the readers of our texts of their rights to read, copy, and cite them. An open licence might for instance allow reproduction of the text but might prohibit charging for accessing the reproduction.

Formulating one’s own licence text is a challenge and one might not be familiar enough with the necessary “legalese” to write a text readers can rely on. In consequence we suggest looking at established licences and having made a case for open access {»open science, open knowledge etc.«}, we suggest to start with creative commons licences.⁵

3. Publish / distribute the content

Websites of more than a single page of text tend to be technically complex and require an infrastructure of data storage, internet connections, web addresses, some content management system (cms), and databases that rarely come for free and without the need of maintenance. In addition, contemporary dynamic websites are almost impossible to archive or download in their entirety. To reduce complexity and the number of technologies, we suggest using static, self-contained websites without a content management system and no database.

jekyll: Jekyll is currently one of the most popular open tools to generate “dynamic” static, self-contained websites from plain text files containing MarkDown and Yaml, using nothing but HTML and CSS and nested files and folders. These can either be hosted online or distributed on a USB key.
GitHub pages: There are, of course, plenty of open, free or commercial providers to host your website. But having already signed up to GitHub and keeping all our texts in a GitHub repository, it is worthwhile to look at GitHub pages, which provide free hosting of version-controlled content and supports jekyll out of the box. This means one can directly publish Markdown-formatted plain text files through GitHub pages and jekyll.

Challenges:

How to deal with sensitive data / material, that should not be publicly accessible, such as ethnographic field notes?

OpenPGP: PGP stands for “pretty good privacy” and is widely used for encrypting emails. OpenPGP is a proposed standard in RFC 4480. It works with plain text and thus with all the tools covered in this {==course==}{»workshop«}.

Resources / literature:

The Programming Historian: open-access, peer-reviewed suite of tutorials that help humanists learn a wide range of digital tools, techniques, and workflows to facilitate their research.

{»add relevant literature«}

Use of git and GitHub in the humanities

makerlab at UVic: git/github in 20 steps
Chad Black’s getting started with github and prose.io on using a combination of GitHub, jekyll, and prose.io for a collaborative seminar blog.
GitHub, Academia, and Collaborative Writing
Push, Pull, Fork: GitHub for Academics
GitHub for Academics: the open-source way to host, create and curate knowledge
Living in a Plain Text World (Tools We Use)
This software for academic paper writing is inspired by Git
How To: Use Git to Version Your Writing
Using git in my writing workflow

Tenen, Dennis and Grant Wythoff. “Sustainable Authorship in Plain Text Using Pandoc and Markdown.” ↩
MarkDown really is only a convention and John Gruber’s (and Adam Swartz’ [yes, the Adam Swartz]) canonical description of the Markdown syntax is at least partially ambiguous and and lacks some core functionality for academic writing, such as support for tables and footnotes. In consequence, a plethora of formats (MultiMarkdown, GitHub flavored markdown, etc.) and software implementations have proliferated inspired by and based on Markdown that make the actual rendering of Markdown in HTML rather unpredictable beyond the core functionality. In recent years, a group of people involving John MacFarlane, professor of philosophy at UC Berkeley and author of Pandoc, proposed and developed are more rigid standard which they call CommonMark. ↩
Other options based on git are BitBucket and GitLab. ↩
GitHub provides its own clients for Mac and Windows ↩
There is a great list of open source licences, including links to their full texts, at opensource.org ↩

Presentation at ‘Dangerous Classes’ conference in Oxford

2017-02-18T17:06:47+00:00

On 26 January this year I had a chance to present a paper on food riots titled “Women in the streets! Urban food riots in late Ottoman Bilād al-Shām” at the conference “The ‘Dangerous Classes’ in the Middle East and North Africa” organised by Stephanie Cronin at St. Antony’s college, University of Oxford.

As always, I have made the slides available on GitHub.

The difficulty of establishing publication dates for books from late Ottoman Bilād al-Shām

2016-10-12T10:13:53+00:00

Original post (2015-07-23)

I am currently preparing my thesis for publication and the process of revision, I am again turning to Ottoman legal texts and their translations. Today I want to come briefly back to a question I have extensively dealt with in my thesis: The difficulty of dating printed sources from the late Ottoman Bilād al-Shām. Consider the following image of the imprint for the second volume of Nawfal Niʿmat Allah Nawfal’s translation of Ottoman laws edited by Khalīl Khūrī and published by al-Maṭbaʿa al-Adabiyya in Beirut¹.

The date of publication is clearly stated as the year 1301. The calendar of this dating could either be Muslim 1301 (hijrī), which would translate to 1883/84 Gregorian, or Ottoman 1301 (mālī), which began on 13 March 1885. So far, so common and without further ado—and I strongly suspect without further thought—the world’s libraries catalogued the book as having been publised in 1883. The content of the book as well as the publishing house—al-Maṭbaʿa al-Adabiyya was a venture by Khalīl Sarkīs², the Greek Orthodox owner of Beirut’s most successful periodical and only daily newspaper Lisān al-Ḥāl—raise the probability for mālī reckoning.

Then I came across announcements of its publication in the Beiruti press. Both Lisān al-Ḥāl and Thamarāt al-Funūn ran adverts for the new publication on their front pages in May 1887.³ Against the backdrop of the book’s publisher printing announcements in his own newspaper it now seemed likely that the second volume had inherited the date of publication from the first volume and was indeed printed only in 1887.

Unlike the first volume, scans of the second cannot be found online, but I was lucky to locate a copy at the library of the American University of Beirut. To my surprise the volume carried an ownership stamp on its last page:

It reads in French and Arabic:

Librairie Universelle 1883 Beyrout

li-l-maktaba al-jāmiʿa li-Khalīl al-Khūrī 1883 Bayrūt

The stamp seemingly indicates that the copy at AUB once belonged to the editor of the book, Khalīl al-Khūrī, himself. The stamp also records a Gregorian date: 1883. If this was the date of acquisition, the stamp could prove that the volume was indeed published in 1301 hijrī. Going through my research notes, however, it appeared that the Librairie Universelle was a publishing press and bookstore rather than a library run by the brothers Amīn and Khalīl al-Khūrī in Beirut. It is unclear when they had established the printing press, but at least by 1887 they had adopted the more common Arabic term for a publishing press: al-maṭbaʿa al-jāmiʿ.⁴ But why would a bookstore stamp its merchandise?

For the moment this question must remain as open as the opening date of the endeavour.

update 2016-02-26

I was lucky to have a one-month trial access to Gale’s new “Early Arabic Printed Books from the British Library” platform and eagerly browsed and searched for books from Beirut and Damascus. They hold a number of Niqūlā Efendi Naqqāsh’s translations of Ottoman laws. One of them, a translation of Orhan Vahan Efendi’s comment on the Commercial Code,⁵ carried another ownership stamp:

Librairie générale A. Sader Beyrouth

al-maktaba al-ʿumūmiyya li-Ibrāhīm Ṣādir Bayrūt

But this time we are better informed about the publisher al-Maktaba al-ʿUmūmiyya. This publishing house and bookstore was set up by Ibrāhīm Efendi Ṣādir in 1863. Soon the company also included his sons and operated under “Ibrāhim Ṣādir wa-Awlāduhu”. As “Sader” the company is still active and the leading publisher of Lebanese legal compendia. It seems that most of its operations have shifted online these days.

update 2016-10-12

In May this year, I participated in a conference titled “Books in Motion” in Beirut and had a chance to finally meet Hala Auji and listen to her talk on “Visual Translations: The Shifting Material Dimensions of 19th-Century Printed Editions of Arabic Classics” — which was based on research conducted for her recently published book.⁶ Her talk focussed on the gradual shift from manuscript to print culture between Cairo, Beirut, and India and its visual aspects. Talking about the continuous popularity of al-Mutanabbī’s Dīwān and the plethora of editions published during the late 19th century, she projected the image of an edition held at Harvard’s Widener Library. According to Hala Auji this Dīwān was printed in Calcutta but the frontispiece carries the same stamp of Ibrāhīm Ṣādir’s al-Maktaba al-ʿUmūmiyya:

It is not entirely clear how Auji arrived at the conclusion that this edition was printed in Calcutta. Comparing the frontispieces it seems that a copy at University of Michigan, freely available through HathiTrust, is indeed the same edition. Its final page (292) states that Shaykh ʿUmar al-Rāfiʿī confirms the veracity of this print edition that was completed in 1283 aH [1866]. According to Ilyān Sarkīs’ union catalogue of Arabic printed works (Muʿjam al-maṭbūʿat al-ʿarabiyya wa-l-muʿarraba) this edition of 292 pages was printed on a lithographic printing press in Cairo.⁷

Nawfal, Nawfal Efendi Niʿmat Allāh. Al-dustūr: Tarjamahu min al-lughat al-turkiyya ilā al-ʿarabiyya Nawfal Niʿmat Allāh Nawfal bāshkātib kamāruk ʿArabistān sābiqan; bi-murājaʿa wa tadqīq Khalīl al-Khūrī mudīr maṭbūʿāt Wilāyat Sūriyya. Edited by Khalīl Efendi al-Khūrī. Vol.2. Bayrūt: al-Maṭbaʿa al-adabiyya, 1301. ↩
c.f. MWT Salname Suriye 13 1298 aH [Dec. 1880]:247, UBTüb Salname Suriye 17 1302 aH [Oct. 1884]:250. ↩
Lisān al-Hāl 26 May 1887 (#959):1, Thamarāt al-Funūn 30 May 1887 (#633):1 advertised the book at a price of 2 mecidiye or Ps 40. ↩
e.g. Lisān al-Ḥāl 13 Oct. 1887 (#999):4. ↩
Vāḥān, Ohan. Sharḥ Qānūn al-Tijārah. Translated by Niqūlā Efendi Naqqāsh. Bayrūt: al-Maṭbaʿa al-ʿUmūmiyya, 1880. ↩
Auji, Hala. Printing Arab Modernity: Book Culture and the American Press in Nineteenth-Century Beirut. Leiden: Brill, 2016. ↩
Sarkīs, Yūsuf Ilyān. Muʿjam al-maṭbūʿat al-ʿArabiyya wa-l-muʿarraba: wa-huwa shāmil li-asmāʾ al-kutub al-maṭbūʿa fī al-aqtar al-sharqiyya wa-l-gharbiyya, maʿa dhikr asmāʾ muʾallifiha wa-lumʿa min tarjamātihim; wa-dhalik min yawm ẓuhūr al-ṭabaʿa ilā nihāyat al-sanat al-Hijriyya 1339 al-muwāfiqa li-sanat 1919 milādiyya. 2 vols. Vol.2. Miṣr: Maṭbaʿat Sarkīs, 1928; p.1616. ↩

Essay published in edited volume ‘Digital Humanities and Islamic & Middle East Studies’

2016-05-20T21:55:20+00:00

After more than two years the proceedings of the conference “Digital Humanities and Islamic & Middle East Studies” including my methodological essay on mapping newspaper discourses on the topography of late Ottoman Damascus have been published under the same title with de Gruyter. Elias Muhanna, who had organised the conference held between October 2013 at Brown University, did a great job as editor of the volume which is now available online and—ironically—in print for the substantial price of € 99.95 / USD 140.¹ It comprises essays by Elias Muhanna, Travis Zadeh, Dagmar Riedel, Chip Rosetti, Nadia Yaqub, Maxim Romanov, Alex Bley, José Haro Peralta and Peter Verkinderen, Joel Blecher, Dwight F. Reynolds, and myself.

We are not allowed to share digital galley proofs of our essays, but I will make the text available here in the near future. The maps accompanying my essay and the code is already available on GitHub.

Muhanna, Elias (ed.). Digital Humanities and Islamic & Middle East Studies. Boston, Berlin: De Gruyter, 2016; Grallert, Till. “Mapping Ottoman Damascus Through News Reports: A Practical Approach.” In Digital Humanities and Islamic & Middle East Studies. Edited by Elias Muhanna. Boston, Berlin: De Gruyter, 2016: 175–98. ↩

Presentation of Digital Muqtabas at conference ‘Books in Motion’ in Beirut

2016-05-10T20:51:22+00:00

I was invited to present Digital Muqtabas at the conference “Books in Motion: Exploring concepts of mobility in cross-cultural studies of the book” organised by Sonja Mejcher-Atassi, Hala Auji and James Hodapp (all AUB) that took place at AUB and OIB between 5-7 May 2016. The beautiful conference poster-cum-programme is available as PDF.

My paper was titled: “Majallat al-Muqtabas between gray online libraries, large-scale scanning efforts, and programming tools: producing fully open, collaborative, and scholarly editions of early Arabic periodicals” and you can find the abstract below. It was part of a panel on digital remediation of the book on Saturday, 7 May, which I shared with David Wrisley (AUB) and Torsten Wollina (OIB). Torsten spoke on the challenges posed by the current state of digitization of books, manuscripts, and catelogues to researchers of the Islamicate world. David presented the fascinating results of course he taught on mapping Beirut’s publishing industry. The abstract to his paper is online as is the project website.

As always, I have made the slides available on GitHub.

Abstract

Moving from the material to the seemingly immaterial, digitisation offers remedies for some of the Middle East’s most pressing issues when it comes to books as texts and cultural artifacts: protection, discovery, and access—particularly in times of war and iconoclasm, borders (between territories, linguistic communities, classes etc.), and highly dispersed audiences and artifacts. Yet, digitisation and the infrastructure to deliver digital artifacts is expensive and thus we have not a single scholarly digital edition of early Arabic printed books or periodicals—despite their importance for the history of the nahḍa, Arab political nationalism, and the Islamic reform movement; and despite the apparent promises for new methodological approaches to the book.

Some of the largest scanning projects, Hathitrust, the Endangered Archives Programme (EAP), or MenaDoc produce digital facsimiles for tens of thousands of Arabic books; but facsimiles cannot be searched and reliable OCR of Arabic script is not even available to Google. Gray online-libraries of Arabic literature, namely shamela.ws, provide access to a vast body of transcriptions of unknown provenance, editorial principals, and quality; but the transcriptions can be neither trusted nor referenced.

With the open digital edition of Muḥammad Kurd ʿAlī’s Majallat al-Muqtabas (1906–18) we want to show that through re-purposing well-established open software and by bridging the gap between immensely popular but non-academic online-libraries of volunteers and academic scanning efforts as well as editorial expertise, one can produce scholarly editions that remedy the short-comings of either world with very small funds: We use digital texts from shamela.ws, transform them into TEI XML—the quasi-standard for digital scholarly editions—add light structural mark-up, bibliographic meta-data, and link each page to facsimiles provided through EAP and HathiTrust. The digital edition (TEI XML and a basic web display) is then hosted as a public GitHub repository with a Creative Commons BY-SA 4.0 licence. Improvements can be crowd-sourced with clear attribution of authorship and version control using GitHub’s core functionality. Editions are referencable down to the word level for scholarly citations, annotation layers, and web-applications through a documented URI scheme. The web-display can be downloaded and run locally without an internet connection—a necessity for societies outside the global North, which again transforms the book into a highly mobile cultural artifact to be shared among intellectual networks across borders.

Archive.sakhrit.co’s failure as a source for digitsed imagery of Arabic journals

2016-04-22T20:44:55+00:00

Recently, a colleague pointed me to yet another gray online library of Arabic material—one that was entirely dedicated to cultural and litrary journals. Arshīf al-majallāt al-adabiyya wa-l-thaqafiyya al-ʿarabiyya (archive.sakhrit.co) presents a large number of Arabic journals over very long publication periods, providing:

Partially watermarked digital imagery
Functional tables of content for each issue, including author, title, page number
Some bibliographic metadata on the issue level

They do not provide a digital, machine-readable text.

Focus of the corpus

The focus is on cultural and scientific journals of the 20th century but they also have some journals of the late 19th and early 20th centuries, among them:

As one would imagine, I was exited to see a seemingly complete scan of al-Muqtabas among the journals hosted by archive.sakhrit. I am currently working on a digital scholarly and collaborative edition of this journal (see the project’s GitHub repository and blog)¹ and only found accessible scans of volumes 1 to 8. Thus, the prospect of an additional and potentially complete scan, including volume 9, was exiting. But after my initial enthusiasm, I was in for a serious disappointment.

Quality of the corpus

As with other gray libraries, such as al-Maktaba al-Shāmila (shamela.ws), archive.sakhrit is quiet about the personnel or company behind it. It remains unclear where the originals came from, who scanned them, who transcribed the heads, authors, and page numbers seemingly available for every article. The rather illegal / gray nature of the endeavour becomes clear from the shift from a .com to a .co domain (country code top-level domain for Colombia) documented by the watermark in the imagery that still refers to the http://Archivebeta.Sakhrit.com domain.

I have assessed the quality of their “scans” of al-Muqtabas. Some volumes/ issues have been scanned from the original or a facsimile edition. Others, such as at least volumes 4 and 5, were indeed rendered from a modern digital text, namely shamela’s transcription. This is supported by the strikingly similar absence of all footnotes and non-Arabic script; a modern interpunction not present in the original; paragraph breaks that mirror shamela’s transcription; and the ellipsis between the two sections of a bayt, as provided by shamela (e.g. archive.sakhrit and shamela). The final evidence to prove this argument is that an uncommented gap of almost three pages in shamela’s transcription of volume 5(7) is reproduced in archive.sakhrit’s supposed facsimiles (compare the issue on digital-muqtabas, shamela, and archive.sakhrit).

At *archive.sakhrit* and *shamela* the text stops at من هذا الناشيء

Scans of the original page reveal that the text just continues on line 6 with ابو ابعباس

Another, rather common, problem is archive.sakhrit’s bibliographic metadata on both the article and the issue level. The first is obviously poised by the reference to image renderings of shamela’s transcription, whose pagination does not correspond to the printed original. In addition, the tables of content provide only an eclectic selection of articles and sections and many articles are mis-attributed (for an example compare the MODS file for out digital edition of Muqtabas 4(1) with archive.sakhrit’s fihris of the same issue). The second issue relates to the publication dates. For al-Muqtabas, archive.sakhrit assumes a publication schedule in which volumes correspond to Gregorian years and issues correspond to Gregorian months (i.e. according to archive.sakhrit Muqtabas 1(1) was published on 1 January 1906). This is despite the fact that al-Muqtabas clearly states its publication schedule on the front page of every volume as adhering to the hijrī calendar for both volumes and issues (e.g. archive.sakhrit’s facsimile of this issue’s first page. As a consequence, bibliographic data obtained from archive.sakhrit cannot be considered reliable in any sense.

Therefore, archive.sakhrit is even more problematic than shamela in terms of scholarly use.² The user is always aware of reading a derivative with an unknown relation to an assumed original while accessing a text from shamela. At archive.sakhrit, on the other hand, the user is deceived by a seemingly faithful representation of a fake original.

Access, structure of the website

In addition to the user interface of the website, which has a severe 1990s look and feel but otherwise seems to be fully functional, the collection can be accessed in a number of ways that would ease automated access for other applications.

Individual issues can be accessed
- by modifying the php call: http://archive.sakhrit.co/newPreview.aspx?ISSUEID=5649
- note that ISSUEID is a single variable across the entire website and not specific to any one single journal.
- note also that issues of an individual journal have no consecutive ISSUEIDs.
individual page images can be accessed through a seemingly structured URL:
- http://archive.sakhrit.co/MagazinePages/Magazine_JPG/AL_moqtabs/AL_moqtabs_1906/Issue_1/001.JPG
- http://archive.sakhrit.co/MagazinePages/Magazine_JPG/AL_moqtabs/AL_moqtabs_1906/Issue_11/553.JPG

I have cross-posted this post to the Digital Muqtabas’ project blog. ↩
I found at least one university library’s website linking to al-Muqtabas on archive.sakhrit, AUC. ↩

Presentation of Digital Muqtabas at DiXiT Convention 2 in Cologne

2016-03-17T16:28:14+00:00

I was invited to present Digital Muqtabas at DiXiT’s second convention on “Academia, Cultural Heritage, Society” that took place in Cologne between 14–18 March. The paper, titled: “The journal al-Muqtabas between Shamela.ws, HathiTrust, and GitHub: producing open, collaborative, and fully-referencable digital editions of early Arabic periodicals—with almost no funds”, was part of a panel on “Social Editing & Funding”, which I was lucky to share with Ray Siemens, who skyped in from Victoria, and Misha Misha Broughton.

Slides can be found here and the abstract here.

The historian’s puzzle: various differences between copies of printed periodicals that ought to be similar. The case of Dūstur

2016-01-29T00:00:00+00:00

During the last years I have sprodically written about the various surprises of early Arabic and Ottoman printed books and particularly the vast differences in pagination, spelling, and even content between copies that ought to be identical if one was to believe the information on the cover or the metadata provided by library catalogues (on this blog: here, here, and here). This post is a first attempt to summarise my findings on the publication history of the first series (tertib-i evvel) of Düstur, the officially sanctioned collection of Ottoman laws and regulations, published in Istanbul between 1872 and 1879.

I discovered that, contrary to my expectations, libraries around the world hold numerous unmarked editions and print-runs of tertib-i evvel of Düstur. Copies vary in pagination, spelling, and content. Yet, neither the people I asked nor the scholarly works citing copies of Düstur, seem to be aware of significant differences between copies of the same volume—which is not too unexpected an outcome when one considers that most scholars would not consult more than a single copy of every work at a single library; and once you read and / or copied a work, you would not consult another copy at another library with the explicit purpose of comparing the two for dissimilarities. I had also noted that an 1891 index to the first series of Düstur¹ does not contain any information on divergent print-runs and editions.

In consequence, it is almost impossible to confirm references found in scholarly literature. Over the past years I had come to consider the many seemingly wrong references provided, for instance, by the two foremost contemporaneous French translations of Ottoman laws by Aristarchi² and Young³ as, well, erroneous references caused by careless printers, copy-editors, even the translators themselves. But as it stands, they could have just used a different copy than the one available to me.

To illustrate the issue, I had quickly built a simple website providing imagery for different versions of the table of contents of the first, third, and fourth volume of tertib-i evvel of Düstur.

As I could not readily find any concordance or works dealing with this issue (which by the way also pertains to nineteenth-century Arabic monthly journals), I wondered whether anybody on various mailing lists could point me to relevant information. Düstur was the official collection of Ottoman legal texts at the time and the differences between the various print-runs had potentially grave consequences. Yet, to my surprise (again) almost nobody in the scholarly community of Ottomanists seemed to be aware of these puzzling divergences and no reply had any answers to offer.

The first volume of Düstur, tertib-i evvel⁴

In late 2012, I had had access to what I thought were four different copies of volume 1 of tertib-i evvel of Düstur. Two were held at the Hakki Tarık Us Collection (HTU) at the Beyazit Devlet Kütüphanesi⁵, and one each at the University of California, Berkeley and the School of Oriental and African Studies (SOAS), London. I grouped these four copies into two editions, based on the substantial differences in both layout and content. I further subdivided the second edition into three print-runs, which differ in spelling and printing errors (for the lack of a better term). Safa Saraçoğlu of Bloomsburg University, PA, solved at least part of the riddle in early 2013. He pointed out via email that, before Düstur became a series from 1872 onwards, three independent volumes were published under the same title of Düstūr in 1851⁶, 1863⁷, and 1866⁸. Hence, what I thought of as the first edition of the first volume of the first series (tertib-i evvel) of Düstur turned out to be the 1863 volume⁷. Nevertheless, the issue remains that there are at least three print-runs of the first volume that differ in spelling and printing errors. Two of them are availabe online: one at HTU and the UC Berkeley copy through HathiTrust (if one has an American IP, that is). A fourth copy is available through the digital collections of Türk Büyük Millet Meclisi Kütüphanesi Açık Erişim Koleksiyonu (TBMM) but I have not yet checked it against the other three.

The second volume⁹

I have currently three digital copies of the second volume of tertib-i evvel of Düstur: from [HTU], UC Berkeley, and TBMM. In addition, I have seen the physical copy at SOAS. The difference in the shape of the number 6 (or rather “٦”) and the different font width in the UC Berkeley copy indicate two independently type-set print-runs. Pagination is seemingly identical.

The third volume¹⁰

There are at least two editions or print-runs of the third volume of tertib-i evvel of Düstur that differ in spelling and a marginally different page layout. The first can be found at SOAS and HTU and the second at UC Berkeley and TBMM.

The fourth volume¹¹

At a workshop on Ottoman municipalities at the Istanbul Şerhir Üniversitesi in November 2015, I finally met Safa Saraçoğlu in person. We had long and interesting discussions on Ottoman legal history, digitisation efforts, and translations of Ottoman legislation into the various languages of the empire. We also shamelessly shared our private copies of Ottoman texts, among them Mīltiyādī Ḳārāvokīros’s Ottoman legal dictionary. To my surprise, the entry on expropriation of real-estate (istimlāk) references two print-runs of the fourth volume of the first series of Düstur with different paginations.¹² According to this entry, among the copies I have seen, the one at SOAS would have originated in the first print-run, while those at Staatsbibliothek zu Berlin (SBB) and TBMM are part of the second print-run.

Ḳaraḳoç, Sarkiz. Miftāḥ-i Ḳavānīn-i ʿOŝmāniye. Der-i Saʿādet: Maḥmūd Bey Maṭbaʿası, 1309 aH [1891]. ↩
Aristarchēs, Grēgorios Bey. Législation Ottomane, Ou Recueil des Lois, Réglements, Ordonnances, Traités, Capitulations et Autres Documents Officiels de L’Empire Ottoman. Edited by Dēmētrios Nikolaides. Vol.1-7. Constantinople: Imprimerie Frères Nicolaides / Bureau du Journal Thraky, 1873–88. ↩
Young, George. Corps de Droit Ottoman: Recueil des Codes, Lois, Règlements, Ordonnances et Actes les Plus Importants du Droit Intérieur, et D’Études Sur le Droit Coutumier de L’Empire Ottoman. Vol.I-VII. Oxford: Clarendon Press, 1905–06. ↩
N.N. Düstur: Kavanin Ve Nizamat Ve Muahedat Ile Umuma Ait Mukavelat Ve Iradat-i Seniyeyi Muhtevidir. Vol.1 Tertip I. Der-i Saʿādet: Maṭbaʿa-yi ʿĀmire, 1289 aH [1872]. ↩
Large parts of this collection were digitised in a cooperation with the Tokyo University of Foreign Studies. ↩
N.N. Düstūr. Vol.[1]. [Der-i Saʿādet]: Taḳvīmḫāne-yi ʿĀmire, 15 Rab II 1267 aH [17 Feb. 1851]. ↩
N.N. Düstur: Ḳavānīn Ve Niẓāmātıñ Münderic Olduġu Mecmūʿa. Vol.[2]. Der-i Saʿādet: Maṭbaʿa-yi ʿĀmire, Shaʿ 1279 aH [Feb. 1863]. This volume is available online from the Hakkı Tarık Us Collection, where it is wrongly catalogued as volume four of the first series. ↩ ↩²
N.N. Düstur: Ḳavānīn Ve Niẓāmātıñ Münderic Olduġu Mecmūʿa. Vol.[3]. Der-i Saʿādet: Maṭbaʿa-yi ʿĀmire, 1866. ↩
N.N. Düstur: Kavanin Ve Nizamat Ve Muahedat Ile Umuma Ait Mukavelat Ve Iradat-i Seniyeyi Muhtevidir. Vol.2 Tertip I. Der-i Saʿādet: Maṭbaʿa-yi ʿĀmire, 1289 aH [1872]. ↩
N.N. Düstur: Kavanin Ve Nizamat Ve Muahedat Ile Umuma Ait Mukavelat Ve Iradat-i Seniyeyi Muhtevidir. Vol.3 Tertip I. Der-i Saʿādet: Maṭbaʿa-yi ʿĀmire, 1289 aH [1876]. ↩
N.N. Düstur: Kavanin Ve Nizamat Ve Muahedat Ile Umuma Ait Mukavelat Ve Iradat-i Seniyeyi Muhtevidir. Vol.4 Tertip I. Der-i Saʿādet: Maṭbaʿa-yi ʿĀmire, 1295 aH [1879]. ↩
Ḳārāvokīros, Mīltiyādī. Lüġat Ḳavānīn-i ʿOŝmāniye. Istānbūl: “A. Aṣādūriyān” Şereket-i Mürettebe Maṭbaʿası, 1310 R [1894/95], p.79 ↩

Majallat al-Muqtabas: one of the most important journals of late Ottoman Bilād al-Shām as open, collaborative, scholarly digital edition

2015-11-13T17:01:16+00:00

[Update: the project has it’s own blog]

In the context of the current onslaught cultural artifacts in the Middle East face from the iconoclasts of the Islamic State, from the institutional neglect of states and elites, and from poverty and war, digital preservation efforts promise some relief as well as potential counter narratives. They might also be the only resolve for future education and rebuilding efforts once the wars in Syria, Iraq or Yemen come to an end.

Early Arabic periodicals, such as Butrus al-Bustānī’s al-Jinān (Beirut, 1876–86), Yaʿqūb Ṣarrūf, Fāris Nimr, and Shāhīn Makāriyūs’ al-Muqtaṭaf (Beirut and Cairo, 1876–1952), Muḥammad Kurd ʿAlī’s al-Muqtabas (Cairo and Damascus, 1906–16) or Rashīd Riḍā’s al-Manār (Cairo, 1898–1941) are at the core of the Arabic renaissance (al-nahḍa), Arab nationalism, and the Islamic reform movement. Due to the state of Arabic OCR and the particular difficulties of low-quality fonts, inks, and paper employed at the turn of the twentieth century, they can only be digitised by human transcription. Yet despite of their cultural significance and unlike for valuable manuscripts and high-brow literature, funds for transcribing the tens to hundreds of thousands of pages of an average mundane periodical are simply not available. Consequently, we still have not a single digital scholarly edition of any of these journals. But some of the best-funded scanning projects, such as Hathitrust, produced digital imagery of numerous Arabic periodicals, while gray online-libraries of Arabic literature, namely shamela.ws, provide access to a vast body of Arabic texts including transcriptions of unknown provenance, editorial principals, and quality for some of the mentioned periodicals. In addition, these gray “editions” lack information linking the digital representation to material originals, namely bibliographic meta-data and page breaks, which makes them almost impossible to employ for scholarly research.

With the GitHub-hosted TEI edition of Majallat al-Muqtabas we want to show that through re-purposing available and well-established open software and by bridging the gap between immensely popular, but non-academic (and, at least under US copyright laws, occasionally illegal) online libraries of volunteers and academic scanning efforts as well as editorial expertise, one can produce scholarly editions that remedy the short-comings of either world with very small funds: We use digital texts from shamela.ws, transform them into TEI XML, add light structural mark-up for articles, sections, authors, and bibliographic metadata, and link them to facsimiles provided through the British Library’s “Endangered Archives Programme” and HathiTrust (in the process of which we also make first corrections to the transcription). The digital edition (TEI XML and a basic web display) is then hosted as a GitHub repository with a CC BY-SA 4.0 licence.

By linking images to the digital text, every reader can validate the quality of the transcription against the original, thus overcoming the greatest limitation of crowd-sourced or gray transcriptions and the main source of disciplinary contempt among historians and scholars of the Middle East. Improvements of the transcription and mark-up can be crowd-sourced with clear attribution of authorship and version control using .git and GitHub’s core functionality. Editions will be referencable down to the word level^[currently we provide stable URLs down to the paragraph level] for scholarly citations, annotation layers, as well as web-applications through a documented URI scheme. The web-display is implemented through a customised adaptation of the TEI Boilerplate XSLT stylesheets; it can be downloaded, distributed and run locally without any internet connection—a necessity for societies outside the global North. Finally, by sharing all our code (mostly XSLT) in addition to the XML files, we hope to facilitate similar projects and digital editions of further periodicals, namely Rashīd Riḍā’s al-Manār.

Scope of the project

The purpose and scope of the project is to provide an open, collaborative, referencable, scholarly digital edition of Muḥammad Kurd ʿAlī’s journal al-Muqtabas, which includes the full text, semantic mark-up, and digital imagery.

The digital edition will be provided as TEI P5 XML with its own schema. All files are hosted on GitHub

The project will open avenues for re-purposing code for similar projects, i.e. for transforming full-text transcriptions from some HTML or XML source, such as al-Maktaba al-Shamela, into TEI P5 XML, linking them to digital imagery from other open repositories, such as EAP and HathiTrust, and generating a web display by, for instance, adapting the code base of TEI Boilerplate.

The most likely candidates for such follow-up projects are

Muḥammad Rashīd Riḍā’s journal al-Manār
- full text from shamela: 8605 views
- imagery from hathitrust,imagery / PDFs from the Internet Archive, which are linked from al-Maktaba al-Waqfiyya
ʿAbdallah al-Nadīm’s majallat al-ustādh, Cairo, 24 Aug 1892
- full text from shamela: 11337 views.
ʿAbd al-Qādir bin Muḥammad Salīm al-Kaylānī al-Iskandarānī’s majallat al-ḥaqāʾiq (al-dimashqiyya), 1910
- full text from shamela: 5134 views.

The journal al-Muqtabas

Muḥammad Kurd ʿAlī published the monthly journal al-Muqtabas between 1906 and 1914(1916). The publication schedule followed the Muslim hijrī calendar and, after the Young Turk Revolution of July 1908, publication moved from Cairo to Damascus in the journal’s third year.

There is some confusion as to the counting of issues and their publication dates. According to the masthead and the cover sheet, al-Muqtabas was published following the Islamic hijrī calendar (from the journal itself it must remain open whether the recorded publication dates were the actual publication dates). Sometimes the printers made errors: issue 2 of volume 4, for instance, carries Rab I 1327 as publication date on the cover sheet, but Ṣaf 1327 in its masthead. The latter would correspond to the official publication schedule.

Samir Seikaly argues that Muḥammad Kurd ʿAlī was wrong in stating in his memoirs that he published 8 volumes of 12 issues each and two independent issues.^[{Seikaly 1981@128}] But the actual hard copies at the Orient-Institut Beirut and the digital facsimiles from HathiTrust show that Kurd ʿAlī was right insofar as volume 9 existed and comprised 2 issues only. As it turns out, al-Muqtabas published a number of double issues: Vol. 4 no. 5/6 and Vol. 8 no. 11/12

In addition to the original edition, at least one reprint appeared: In 1992 Dār Ṣādir in Beirut published a facsimile edition, which is entirely unmarked as such but for the information on the binding itself. Checking this reprint against the original, it appeared to be a facsimile reprint: pagination, font, layout — everything is identical. But as Samir Seikaly remarked in 1981 that he used “two separate compilations of al-Muqtabas […] in this study” there must be at least one other print edition that I have not yet seen.^[{Seikaly 1981@128}]

Digital imagery

Image files are available from the al-Aqṣā Mosque’s library in Jerusalem through the British Library’s “Endangered Archives Project” (vols. 2-7), HathiTrust (vols. 1-6, 8), and Institut du Monde Arabe. Due to its open access licence, preference is given to facsimiles from EAP.

EAP119

links to volumes:
- Vol. 2
- Vol. 3
- Vol. 4
- Vol. 5
- Vol. 6
- Vol. 7
access:
- the journal is in the public domain and the images can be freely accessed without restrictions. EAP does not provide a download button.
- Terms of access for material provided by the British Library can be found here

HathiTrust

links to volumes
- Vol. 1
- Vol. 2
- Vol. 3
- Vol. 4
- Vol. 5
- Vol. 6
- Vol. 8
- Index
access
- The journal is in the public domain in the US and can be freely accessed and downloaded
- Outside the US, access is restricted.
- Formal licence:

Public Domain or Public Domain in the United States, Google-digitized: In addition to the terms for works that are in the Public Domain or in the Public Domain in the United States above, the following statement applies: The digital images and OCR of this work were produced by Google, Inc. (indicated by a watermark on each page in the PageTurner). Google requests that the images and OCR not be re-hosted, redistributed or used commercially. The images are provided for educational, scholarly, non-commercial purposes. Note: There are no restrictions on use of text transcribed from the images, or paraphrased or translated using the images.

Full text

Somebody took the pains to create fully searchable text files and uploaded everything to al-Maktaba al-Shamela and WikiSource.

al-Maktaba al-Shāmila

Extent: According to the main entry, shamela has all 96 issues.
Transcribers, editors: Apparently, they have been typed and copy-edited by unnamed humans.
Features edition: paragraphs, page breaks, headlines.
Features interface:
- all issues can be browsed for headlines and searched
- all pages can be individually adressed in the browser: http://shamela.ws/browse.php/book-26523#page-2290

WikiSource

It seems that somebody took the pains to upload the text from shamela to WikiSource. Unfortunately it is impossible to browse the entire journal. Instead one has to adress each individual and consecutively numbered issue, e.g. Vol. 4, No. 1 is listed as No. 37

TEI edition of al-Muqtabas

The main challenge is to combine the full text and the images in a digital XML edition following the TEI. As al-maktabat al-shāmila did not reproduce page breaks true to the print edition, every single one of the more than 6000 page breaks must be added manually and linked to the digital image of the page.

General design

The edition should be conceived of as a corpus of tei files that are grouped by means of xinclude. This way, volumes can be constructed as single TEI files containing a of TEI files and a volume specific and

Detailled description and notes on the mark-up are kept in a separate file in the GitHub repository.

Quality control

A simple way of controlling the quality of the basic structural mark-up would be to cross check any automatically generated table of content or index against the published tables of content at the end of each volume and against the index of al-Muqtabas published by Riyāḍ ʿAbd al-Ḥamīd Murād in 1977.

Web display: TEI Boilerplate

To allow a quick review of the mark-up and read the journal’s content, I decided to customise TEI Boilerplate for a first display of the TEI files in the browser without need for pre-processed HTML and host this heavily customised boilerplate view as a seperate branch of the GitHub repository. For a first impression see here.

Middle Eastern Graffiti

2015-11-12T00:00:00+00:00

Thanks to the British Library’s Endangered Archives Programme (EAP) we have not just access to the collection of late Ottoman periodicals from the library of al-Aqṣā Mosque (EAP119) in Jerusalem, but also to more than 3.000 photos of Maison Bonfils from the Fouad Debbas Collection in Beirut (EAP644) that were made available to the public in 2014.

Looking through this vast body of images—a large part of which documents the ruines and antiquities in the Levant and Egypt—one can find an astonishing record of late-nineteenth century graffiti: names and dates scribbled on walls and columns to document the visit of local and foreign travellers. The urge to record one’s existence was not restricted to the new middle-class traveller. Famously the Ottoman Sultan ʿAbdülḥamīd II and the German Kaiser Wilhelm II had marble plaques affixed to the ruins of Baalbek to commemorate the latter’s visit to the site in 1898. But looking through the images I stumbled over another graffiti

Shāhīn Makāriyūs 1878

This is the same Shāhīn Makāriyūs (also Chahin Macarius, Shaheen Makarius), who, together with Yaʿqūb Ṣarrūf and Fāris Nimr, published the monthly journal al-Muqtaṭaf from 1876 onwards—first in Beirut and, from 1884 onwards, in Cairo. He was also a prominent member of masonic lodges and authored a number of books on the history of freemasonry in Arabic and English. In 1890, al-Bashīr from Beirut already mentioned al-Muqtaṭaf and al-Laṭāʾif, another journal Shāhīn Makāriyūs edited between 1886–96, as being masonic papers.^[Bashīr 4 Nov. 1890 (#1037):3 and Bashīr 26 Nov. 1890 (#1040):1, which ran a front page article titled “The slander of al-Laṭāʾif” (iftirāʾ al-laṭāʾif).] Al-Laṭāʾif al-Muṣawwara, an illustrated successor journal to al-Laṭāʾif, that had most likely stopped being published around 1896, was edited by his son (?) Iskandar Makāriyūs in Cairo from 1915 onwards.

But it’s not just visitors leaving their traces for posterity, some images reveal that the walls of the ruins were also used for advertisements. Several photos of the entrance to the cella of Baalbek’s main temple of Jupiter dating to the 1870s and onwards show that the photo studio Bonfils itself had announced its services on the wall to the right:

BONFILS Photogra[phie …]

a Beyro[uth …]

Vues de Balbek […]

1871