Till Grallert
Scholarly Makerspace
Humboldt-Universität zu Berlin, Universitätsbibliothek, Grimm-Zentrum
Combining Social and Cultural with Digital History: Methodological Challenges and Practical Strategies
2023-03-07
36.20124, 37.16117
<gap/>
(s)<gap/>
periodicals | –1918 | –1929 |
---|---|---|
published | 2054 | 3550 |
known holdings | 540 | 775 |
% of total | 26.29 | 21.83 |
———————— | ——– | ——– |
digitized | 156 | 233 |
% of total | 7.59 | 6.56 |
———————— | ——– | ——– |
multiple digitizations | 51 | 66 |
% of total | 2.48 | 1.86 |
% of digitized | 32.69 | 28.33 |
Arabic periodicals (1798–1918) | WWI as mirrored by Hessian regional papers | |
---|---|---|
community | c. 420 million Arabic speakers | c. 6.2 million inhabitants |
periodicals | 2054 newspapers and journals | 125 newspapers |
digitized | 156 periodicals | 125 newspapers with more than 1.5 million pages |
type | mostly facsimiles | facsimiles and full text |
access | paywalls, geo-fencing | open access |
interface | mostly foreign languages only | local and foreign languages |
No Arabic script
Which Latinized transcription was used?
What are the normalization rules for the search algorithm?
cataloging rules and algorithmic copyright detection cause further inaccessibilities
For old prints, there’s […] kraken/calamari for coders, Transkribus if you’ve got money and just want to have the results[,] and OCR-D if you’ve got an IT department.
(Winkler Mastodon post 2023)
<gap/>
Title | Place | Proprietor | DOI | Volumes | Issues | Articles | Words |
---|---|---|---|---|---|---|---|
al-Ḥaqāʾiq | Damascus | Abd al-Qādir al-Iskandarānī | 10.5281/zenodo.1232016 | 3 | 35 | 389 | 298090 |
al-Ḥasnāʾ | Beirut | Niqūlā Bāz | 10.5281/zenodo.3556246 | 1 | 12 | 201 | NA |
al-Manār | Cairo | Muḥammad Rashīd Riḍā | 35 | 537 | 4300 | 6144593 | |
al-Muqtabas | Cairo, Damascus | Muḥammad Kurd ʿAlī | 10.5281/zenodo.597319 | 9 | 96 | 2964 | 1981081 |
al-Ustādh | Cairo | Abdallāh Nadīm al-Idrīsī | 10.5281/zenodo.3581028 | 1 | 42 | 435 | 221447 |
al-Zuhūr | Cairo | Anṭūn al-Jumayyil | 10.5281/zenodo.3580606 | 4 | 39 | 436 | 292333 |
Lughat al-ʿArab | Baghdad | Anastās Mārī al-Karmalī | 10.5281/zenodo.3514384 | 3 | 34 | 939 | 373832 |
total | 56 | 795 | 9664 | 9311376 |
Hypothesis: distribution of geographic origin of contributions to a periodical is an indicator for its importance
<byline>
<placeName ref="oape:place:9 geon:268064">صيدا</placeName>
<persName ref="oape:pers:2845">مريم زكا</persName>
</byline>
<place type="town" xml:id="place_9">
<placeName type="simple">Saida</placeName>
<placeName xml:lang="ar-Latn-x-ijmes">Ṣaydā</placeName>
<placeName xml:lang="en">Sidon</placeName>
<placeName xml:lang="ar">صيدا</placeName>
<location>
<geo>33.55751, 35.37148</geo>
</location>
<idno type="url">http://en.wikipedia.org/wiki/Sidon</idno>
<idno type="geon">268064</idno>
<idno type="oape">9</idno>
</place>
والأصح الدرعية بلام التعريف (راجع <bibl subtype="journal" type="periodical">مجلة <title level="j" ref="oape:bibl:3 oclc:1034545644">الزهور</title> المصرية <biblScope unit="volume" from="2" to="2">٢</biblScope> : <biblScope unit="page" from="292">٢٩٢</biblScope></bibl>)
وانتخب <persName>فؤاد أفندي الدفتري البغدادي</persName> و<bibl><editor><persName>نوري أفندي</persName></editor> راس كتاب <textLang otherLangs="ota">القسم التركي</textLang> في <bibl type="periodical" subtype="newspaper">جريدة <title ref="oape:bibl:532">الزهور</title></bibl> البغدادية</bibl> نائبين عن <placeName ref="oape:place:372 geon:94824">كربلاء</placeName>.
<person>
<persName><roleName type="pseudonym">ساتسنا</roleName></persName>
<persName><roleName type="pseudonym">أمكح</roleName></persName>
<persName><roleName type="pseudonym">فهر الجابري</roleName></persName>
<persName><roleName type="rank">الأب</roleName> <forename>أنستاس</forename> <forename>ماري</forename> <surname><addName type="nisbah">الكرملي</addName></surname></persName>
<persName><forename>أنستاس</forename> <forename>ماري</forename> <addName type="nisbah">الألياوي</addName> <surname><addName type="nisbah">الكرملي</addName></surname></persName>
<persName><forename>بطرس</forename> <addName type="nasab">بن <forename>جبرائيل</forename></addName> <forename>يوسف</forename> <surname>عواد</surname></persName>
<idno type="VIAF">39370998</idno>
<idno type="oape">227</idno>
<idno type="wiki">Q4751824</idno>
<birth><date source="viaf" when="1866-08-05">1866-08-05</date> in <placeName ref="oape:place:216 geon:98182">Baghdad</placeName></birth>
<death><date source="viaf" when="1947-01-07">1947-01-07</date> in <placeName ref="oape:place:216 geon:98182">Baghdad</placeName></death>
</person>
About 4/5 of all articles or 2/3 of all words carry no byline
Authorship signal is prevalent in most frequent words, i.e. function words
stylo()
package (Eder, Rybicki, and
Kestemont “Stylometry with R” 2016)stylo()
settingsstylo()
tidygraph()
and
igraph()
ggraph()
and
ggplot2()
Multiple anonymous candidates?
Authorship of Anastās Mārī al-Karmalī and Kāẓim al-Duyalī more likely
Authorship of Anastās Mārī al-Karmalī and Kāẓim al-Duyalī more likely
Contributors to Project Jarāʾid: Hala Auji, Philippe Chevrant, Marina Demetriadou, Lamia Eid, Stacy Fahrenthold, Ulrike Freitag, Rana Issa, Nicole Khayat, Peter Magierski, Leyla von Mende, Adam Mestyan, Christian Meier, Daniel Newman, Geoffrey Roper, Sinai Rusinek, Philip Sadgrove, Ola Seif, and Rogier Visser
Contributors to OpenArabicPE: Jasper Bernhofer, Dimitar Dragnev, Patrick Funk, Talha Güzel, Hans Magne Jaatun, Jakob Koppermann, Xaver Kretzschmar, Daniel Lloyd, Klara Mayer, Tobias Sick, Manzi Tanna-Händel, and Layla Youssef
Maxim Romanov for his work on parameter testing
Contributors to OCR: Adam Mestyan, Sinai Rusinek
Links:
Licence: slides and plots are licenced as CC BY-SA 4.0
2. Social network analysis