CLApr 25, 2025

Building UD Cairo for Old English in the Classroom

arXiv:2504.18718v22 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of building linguistic resources for low-resource historical languages like Old English, though it is incremental as it builds on existing UD Cairo sentences and classroom methods.

The authors tackled the problem of creating a treebank for Old English by using a classroom setting to collect and annotate sentences, finding that post-editing LLM outputs and combining beginner annotations can yield good results, with preliminary parsing experiments showing improved performance when using annotated features.

In this paper we present a sample treebank for Old English based on the UD Cairo sentences, collected and annotated as part of a classroom curriculum in Historical Linguistics. To collect the data, a sample of 20 sentences illustrating a range of syntactic constructions in the world's languages, we employ a combination of LLM prompting and searches in authentic Old English data. For annotation we assigned sentences to multiple students with limited prior exposure to UD, whose annotations we compare and adjudicate. Our results suggest that while current LLM outputs in Old English do not reflect authentic syntax, this can be mitigated by post-editing, and that although beginner annotators do not possess enough background to complete the task perfectly, taken together they can produce good results and learn from the experience. We also conduct preliminary parsing experiments using Modern English training data, and find that although performance on Old English is poor, parsing on annotated features (lemma, hyperlemma, gloss) leads to improved performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes