CLIRJun 22, 2020

MedLatinEpi and MedLatinLit: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts

arXiv:2006.12289v25 citations
Originality Synthesis-oriented
AI Analysis

This work provides new datasets for researchers in computational linguistics and digital humanities, enabling authorship analysis of medieval Latin texts, but it is incremental as it focuses on data creation and baseline methods.

The authors introduced MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts for computational authorship analysis, consisting of 294 and 30 curated texts respectively, and provided experimental results for authorship verification tasks, including applying the system to investigate disputed authorship of two medieval epistles.

We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author or not. We also make available the source code of the authorship verification system we have used, thus allowing our experiments to be reproduced, and to be used as baselines, by other researchers. We also describe the application of the above authorship verification system, using these datasets as training data, for investigating the authorship of two medieval epistles whose authorship has been disputed by scholars.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes