CLApr 25, 2025

A UD Treebank for Bohairic Coptic

arXiv:2504.18386v23 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses a critical gap for linguists and historians studying Coptic dialects, but it is incremental as it extends existing treebank methods to a new dialect.

The paper tackles the lack of digital resources for Bohairic Coptic by creating the first syntactically annotated corpus for this dialect, and through parsing experiments, it reveals that Bohairic is a distinct variety from Sahidic Coptic, with joint parsing achieving an accuracy of 85% and cross-dialect parsing showing a 15% drop in performance.

Despite recent advances in digital resources for other Coptic dialects, especially Sahidic, Bohairic Coptic, the main Coptic dialect for pre-Mamluk, late Byzantine Egypt, and the contemporary language of the Coptic Church, remains critically under-resourced. This paper presents and evaluates the first syntactically annotated corpus of Bohairic Coptic, sampling data from a range of works, including Biblical text, saints' lives and Christian ascetic writing. We also explore some of the main differences we observe compared to the existing UD treebank of Sahidic Coptic, the classical dialect of the language, and conduct joint and cross-dialect parsing experiments, revealing the unique nature of Bohairic as a related, but distinct variety from the more often studied Sahidic.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes