Digital Muqtabas: An open, collaborative, and scholarly digital edition of Muḥammad Kurd ʿAlī’s early Arabic periodical Majallat al-Muqtabas (1906–1917/18)

In the context of the current onslaught cultural artifacts in the Middle East face from the iconoclasts of the Islamic State, from the institutional neglect of states and elites, and from poverty and war, digital preservation efforts promise some relief as well as potential counter narratives. They might also be the only resolve for future education and rebuilding efforts once the wars in Syria, Iraq or Yemen come to an end.

Early Arabic periodicals, such as Butrus al-Bustānī’s al-Jinān (Beirut, 1876–86), Yaʿqūb Ṣarrūf, Fāris Nimr, and Shāhīn Makāriyūs’ al-Muqtaṭaf (Beirut and Cairo, 1876–1952), Muḥammad Kurd ʿAlī’s al-Muqtabas (Cairo and Damascus, 1906–16) or Rashīd Riḍā’s al-Manār (Cairo, 1898–1941) are at the core of the Arabic renaissance (al-nahḍa), Arab nationalism, and the Islamic reform movement. Due to the state of Arabic OCR and the particular difficulties of low-quality fonts, inks, and paper employed at the turn of the twentieth century, they can only be digitised by human transcription. Yet despite of their cultural significance and unlike for valuable manuscripts and high-brow literature, funds for transcribing the tens to hundreds of thousands of pages of an average mundane periodical are simply not available. Consequently, we still have not a single digital scholarly edition of any of these journals. But some of the best-funded scanning projects, such as Hathitrust, produced digital imagery of numerous Arabic periodicals, while gray online-libraries of Arabic literature, namely shamela.ws, provide access to a vast body of Arabic texts including transcriptions of unknown provenance, editorial principals, and quality for some of the mentioned periodicals. In addition, these gray “editions” lack information linking the digital representation to material originals, namely bibliographic meta-data and page breaks, which makes them almost impossible to employ for scholarly research.

With the GitHub-hosted TEI edition of Majallat al-Muqtabas we want to show that through re-purposing available and well-established open software and by bridging the gap between immensely popular, but non-academic (and, at least under US copyright laws, occasionally illegal) online libraries of volunteers and academic scanning efforts as well as editorial expertise, one can produce scholarly editions that remedy the short-comings of either world with very little funding: We use digital texts from shamela.ws, transform them into TEI XML, add light structural mark-up for articles, sections, authors, and bibliographic metadata, and link them to facsimiles provided through the British Library’s “Endangered Archives Programme” and HathiTrust (in the process of which we also make first corrections to the transcription). The digital edition (TEI XML and a basic web display) is then hosted as a GitHub repository with a CC BY-SA 4.0 licence.

By linking images to the digital text, every reader can validate the quality of the transcription against the original, thus overcoming the greatest limitation of crowd-sourced or gray transcriptions and the main source of disciplinary contempt among historians and scholars of the Middle East. Improvements of the transcription and mark-up can be crowd-sourced with clear attribution of authorship and version control using .git and GitHub’s core functionality. Editions will be referencable down to the word level for scholarly citations, annotation layers, as well as web-applications through a documented URI scheme.1 The web-display is implemented through a customised adaptation of the TEI Boilerplate XSLT stylesheets; it can be downloaded, distributed and run locally without any internet connection—a necessity for societies outside the global North. Finally, by sharing all our code (mostly XSLT) in addition to the XML files, we hope to facilitate similar projects and digital editions of further periodicals, namely Rashīd Riḍā’s al-Manār.

1. Scope and deliverables of the project

The purpose and scope of the project is to provide an open, collaborative, referencable, and scholarly digital edition of Muḥammad Kurd ʿAlī’s journal al-Muqtabas, which includes the full text, semantic mark-up, bibliographic metadata, and digital imagery. All files but the digital facsimiles are hosted on GitHub.

All deliverables and milestones will be covered in more detail in the following sections.

1.1 Deliverables

The project will open avenues for re-purposing code for similar projects, i.e. for transforming full-text transcriptions from some HTML or XML source, such as al-Maktaba al-Shamela, into TEI P5 XML, linking them to digital imagery from other open repositories, such as EAP and HathiTrust, and generating a web display by, for instance, adapting the code base of TEI Boilerplate.

The most likely candidates for such follow-up projects are

1.2 Milestones

  1. Design a basic TEI schema; done
  2. Import everything from shamela and convert it to TEI XML; done
  3. Improve TEI mark-up: The following steps are independent of each other, need to be done on the issue / file level, and can be distributed across different editors.
    • Bibliographic metadata: volume, issue, page range
    • Basic structural mark-up: sections, articles, heads, authors
    • Page breaks and links to facsimiles
  4. Adapt TEI Boilerplate to the needs of Arabic periodicals; done
  5. Write XSLT to extract bibliographic metadata for every article in al-Muqtabas
    1. Basic: BibTeX; done
    2. Advanced: MODS; done
  6. Presentations to solicit feedback
    • Leipzig, Germany, December 2015: DH Workshop Persian and Arabic
    • Cologne, Germany, March 2016: DiXiT Convention 2
    • Cairo, Egypt, April 2016: Webinar at AUC library
    • Beirut, Lebanon, May 2016: Conference “Books in Motion”
    • Krakow, Poland, August 2016: DH2016 cancelled (due to parental leave)

1.3 Timeline / scheduled releases

There is no proper release schedule yet but I conceive of version 1.0 as the first complete edition.

2. The journal al-Muqtabas

Muḥammad Kurd ʿAlī published the monthly journal al-Muqtabas between 1906 and 1917/18. After the Young Turk Revolution of July 1908, publication moved from Cairo to Damascus in the journal’s third year.

2.1 Publication schedule and number of issues

There is some confusion as to the counting of issues and their publication dates. Samir Seikaly argues that Muḥammad Kurd ʿAlī was wrong in stating in his memoirs that he had published 8 volumes of 12 issues each and two independent issues.2 But the actual hard copies at the Orient-Institut Beirut and the digital facsimiles from HathiTrust and EAP show that Kurd ʿAlī was right insofar as volume 9 existed and comprised 2 issues only. As it turns out, al-Muqtabas also published a number of double issues: Vol. 4 no. 5/6 and Vol. 8 no. 11/12.

According to the masthead and the cover sheet, al-Muqtabas’s publication schedule followed the Islamic hijrī calendar (from the journal itself it must remain open whether the recorded publication dates were the actual publication dates). Sometimes the printers made errors: no. 4/2, for instance, carries Rab I 1327 aH as publication date on the cover sheet, but Ṣaf 1327 aH in its masthead. The latter would correspond to the official publication schedule. External sources, such as reports in the daily newspaper al-Muqtabas, also published by Muḥammad Kurd ʿAlī in Damascus, indicate that the actual publication sequence was indeed not too tightly tied to the monthly publication schedule already at the very beginning:

Thamarāt al-Funūn 29 Jan. 1906 / 4 Dhu II 1323 aH (#1548) announced the publication of no.2 of Majallat al-Muqtabas two months earlier than the date indicated in the issue’s masthead: Ṣafar 1324 aH.3 This also contradicts Muḥammad Kurd ʿAlī’s memoirs, where he states that publication of Muqtabas commenced in early 1324 aH.4

By Summer 1909, al-Muqtabas was lagging severely behind its publication schedule. No. 4/7, scheduled for Rajab 1327 aH (Jul/Aug 1909) was published only in the first week of April 1910 but it took only another week for No. 4/8 to appear.5 No. 4/9 and 4/10 were then published within another month.6 Nevertheless there were still rather late (the latter should have been published in Shawwāl 1327 aH (Oct/Nov 1909)). No. 4/12 (Dhu II 1327 aH [Dec 1909/ Jan 1910]) was published at the beginning of July 1910, some seven months behind the schedule.7

Muḥammad Kurd ʿAlī and his staff seemingly entered another productive period in fall 1910, publishing eight “monthly” issues in three months:

Production seemingly followed a monthly pattern as it took slightly less then six months to publish the next six issues (5/11 - 6/4):

Similar confusion shrouds the end of publication in mystery. As no publication dates were actually provided in the mastheads of individual issues and since the issue wrappers were mostly discarded upon binding the issues into volumes, the only surviving explicit dating is provided by the volumes’ cover sheets. According to its covers sheet, volume 8 was published through 1332 aH, i.e. between December 1913 and November 1914. However an article in issue 9 reports on the inauguration of a Mosque in Berlin at the end of Ram 1333 aH (mid-August 1915). The same issue reviews a number of books printed in 1333 aH and one book published in 1334 aH in addition to the announcement of a publication on the “first year of the war”, which would mean that it was not published before November 1915 (Muḥarram 1334 aH) and most likely in 1916. Issue 12 of the same volume even recorded a notebook for the year 1917 in the section on new books and publications, which also listed further books published in 1334 aH. The following issue continues this trend by publishing an obituary for Shaykh ʿAbd al-Razzāq al-Bayṭār, who died in Damascus on 10 Rab 1335 [4 January 1917].

2.2 Moving publication between Cairo and Damascus.

After the constitiutional revolution in the Ottoman Empire in July 1908, Muḥammad Kurd ʿAlī moved al-Muqtabas from Cairo to Damascus with the publication of no. 4/1. The final page of no. 7/6 announced a (temporary) move back to Cairo.

2.3 Known editions

In addition to the original edition, at least one reprint appeared: In 1992 Dār Ṣādir in Beirut published a facsimile edition, which is entirely unmarked as such but for the information on the binding itself. Checking this reprint against the original, it appeared to be a facsimile reprint: pagination, font, layout — everything is identical. But as Samir Seikaly remarked in 1981 that he used “two separate compilations of al-Muqtabas […] in this study” there must be at least one other print edition that I have not yet seen.13

3. Input:

3.1 Digital imagery

Image files are available from the al-Aqṣā Mosque’s library in Jerusalem through the British Library’s “Endangered Archives Project” (vols. 2-7), HathiTrust (vols. 1-6, 8), and Institut du Monde Arabe. Due to its open access licence, preference is given to facsimiles from EAP.

3.1.1 EAP119

3.1.2 HathiTrust

Public Domain or Public Domain in the United States, Google-digitized: In addition to the terms for works that are in the Public Domain or in the Public Domain in the United States above, the following statement applies: The digital images and OCR of this work were produced by Google, Inc. (indicated by a watermark on each page in the PageTurner). Google requests that the images and OCR not be re-hosted, redistributed or used commercially. The images are provided for educational, scholarly, non-commercial purposes.

Note: There are no restrictions on use of text transcribed from the images, or paraphrased or translated using the images.

3.2 Full text

Somebody took the pains to create fully searchable text files and uploaded everything to al-Maktaba al-Shamela and WikiSource.

3.2.1 al-Maktaba al-Shāmila

3.2.2 WikiSource

Sombody uploaded the text from shamela to WikiSource. Unfortunately it is impossible to browse the entire journal. Instead one has to adress each individual and consecutively numbered issue, e.g. Vol. 4, No. 1 is listed as No. 37

4. Deliverable: TEI edition

The main challenge is to combine the full text and the images in a TEI edition. As al-maktabat al-shāmila did not reproduce page breaks true to the print edition, every single one of the more than 6000 page breaks must be added manually and linked to the digital image of the page.

4.1 General design

The edition is conceived of as a corpus of TEI files that are grouped by means of XInclude. This way, volumes can be constructed as single TEI files containing a <group/> of TEI files and a volume specific <front/> and <back/>

Detailled description and notes on the mark-up are kept in a separate file (documentation_tei-markup.md).

4.2 Quality control

A simple way of controlling the quality of the basic structural mark-up would be to cross check any automatically generated table of content or index against the published tables of content at the end of each volume and against the index of al-Muqtabas published by Riyāḍ ʿAbd al-Ḥamīd Murād in 1977.

4.3 To do

  1. Mark-up: The basic structural mark-up of individual issues is far from complete. All features encoded in HTML by shamela.ws have been translated into TEI XML, but these are limited to the main article / section headers. What needs to be done is:
    • splitting articles into sections and sections into individual articles
    • mark-up of authors with <byline>
  2. Text-image linking: while the links to the facsimiles can be automatically generated for each issue, establishing page breaks (<pb>) must be done manually for all 6.000+ of them

5. Deliverable: A web display adapting TEI Boilerplate

To allow a quick review of the mark-up and read the journal’s content, I decided to customise TEI Boilerplate for a first display of the TEI files in the browser without need for pre-processed HTML and to host this heavily customised boilerplate view as another GitHub repository to be re-used.

The webview provides a parallel display of either online or local facsimiles and the text of al-Muqtabas. It includes a fully functional table of contents, stable links to all section and article heads, and links to bibliographic metadata for every article. For a first impression see al-Muqtabas 6(2).

webview of *al-Muqtabas* 6(2)
webview of *al-Muqtabas* 6(2)
webview of *al-Muqtabas* 6(2)
webview of *al-Muqtabas* 6(2)

A detailed description of the web display is available here.

User will, of course, want to search the edition for specific terms and will immediately recognise the lack of a dedicated search field in the webview. But behold, individual issues can be searched through the built-in search function in browsers; just hit ctrl+f (windows) or cmd+f (macintosh) to search individual periodical issues for literal strings. To search across the entire periodical, there are three instantly available options:

Search the GitHub repository
Search the GitHub repository
Search Google with the `site:` operator
Search Google with the `site:` operator

6. Deliverable: Bibliographic metadata / index

Bibliographic metadata for every article in Majallat al-Muqtabas is provided in two formats: BibTeX and MODS in the sub-folder metadata. The metadata includes a URL pointing to the webview of this item and the webview, in turn, includes links to the metadata files for every article.

6.1 BibTeX

BibTeX is a plain text format which has been around for more than 30 years and which is widely supported by reference managers. Thus it seems to be a safe bet to preserve and exchange minimal bibliographic data. The repository currently contains two XSLT stylesheets to automatically generate BibTeX files from the TEI source:

  1. Tei2BibTex-articles.xsl: generates one BibTeX file for each article and section of a periodical issue.
  2. Tei2BibTex-issues.xsl: generates one BibTeX file per periodical issue, comprising entries for every article and section.

There are, however, a number of problems with the format:

6.2 MODS (Metadata Object Description Schema)

The MODS standard is expressed in XML and maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users. Compared to BibTeX MODS has he advantage of being properly standardised, human and machine readable, and much better suited to include all the needed bibliographic information.

The repository currently contains two XSLT stylesheets to automatically generate MODS XML files from the TEI source:

  1. Tei2Mods-articles.xsl: generates one MODS file for each article and section of a periodical issue.
  2. Tei2Mods-issues.xsl: generates one MODS file per periodical issue, comprising entries for every article and section.

MODS also serves as the intermediary format for the free bibutils suite of conversions between bibliographic metadata formats (including BibTeX) which is under constant development and released under a GNU/GPL (General Public License). Tei2Mods-issues.xsl and bibutils provide a means to automatically generate a large number of bibliographic formats to suit the reference manager one is working with; e.g.:

6.3 Index by means of a Zotero group

As the webview or reading edition is implemented on the issue level and as we have currently no plans to implement and host a database on the backend, Digital Muqtabas needed a way to navigate and browse all articles, authors etc. To this end, we have set up the public Zotero group “Digital Muqtabas” (bibliographic metadata). It can be updated by means of the MODS or BibTeX files. Updating from MODS is the preferred method since it is the more expressive format.

Zotero groups are great way to share bibliographic metadata. Hosted by the Roy Rosenzweig Center for History and Media, they allow for public access to structured bibliographic metadata through a web interface. Of course they also integrate with the free and open-source reference manager Zotero. All one needs is a free Zotero account and either the Zotero plug-in for the Firefox and Chrome browsers or the Zotero standalone version for Mac OSX and Linux. One can then join the group and sync all data to the local installation of Zotero, which means that, similar to the webview and all other components of this edition, bibliographic metadata can be browsed and searched through a graphical user interface without a continuous internet connection.

  1. Currently we provide stable URLs down to the paragraph level. For more details see the documentation of the mark-up 

  2. {KurdʿAlī 1928@424}, {Seikaly 1981@128} 

  3. {tf-oib 1548@5} 

  4. {KurdʿAlī 1928@415} 

  5. {muqtabas 76-eap@3}. It is important to note that an article in 4/2 scheduled for February/March 1909 was referenced in the newspaper al-Quds #60 of 11 June 1909

  6. {muqtabas 100-eap@3} 

  7. {muqtabas 128-eap@3} 

  8. {muqtabas 184-eap@3} 

  9. {muqtabas 188-eap@3} 

  10. {muqtabas 200-eap@3}. al-Ḥaqāʾiq 1(10), 1 Jum I 1329 aH, 17 Nīs 1327 R [30 Apr. 1911]:369-74 replied to an article in this issue of al-Muqtabas

  11. {muqtabas 274-eap@3} 

  12. {muqtabas 283-eap@3} 

  13. {Seikaly 1981@128} 

  14. Wikipedia has a better description than the official website.