CLAIAPMay 9

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

arXiv:2605.0914730.0Has Code
Predicted impact top 40% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

Provides empirical guidance for applying modern neural methods to historical NLP tasks, particularly for under-resourced medieval languages in digital humanities.

This paper evaluates LLMs for POS tagging on three medieval Romance languages, finding that fine-tuned LLMs consistently outperform traditional taggers, with cross-lingual transfer learning significantly benefiting under-resourced varieties.

Part-of-speech (POS) tagging for Medieval Romance languages remains challenging due to orthographic variation, morphological complexity, and limited annotated resources. This paper presents a systematic empirical evaluation of large language models (LLMs) for POS tagging across three medieval varieties: Medieval Occitan, Medieval Catalan, and Medieval French. We compare traditional rule-based and statistical taggers with modern open-source LLMs under zero-shot prompting, few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning settings. Experiments on historically grounded datasets show that LLM-based approaches consistently outperform traditional taggers, with fine-tuning and multilingual training yielding the largest improvements. In particular, cross-lingual transfer learning substantially benefits under-resourced varieties, while targeted bilingual training can outperform broader multilingual configurations for specific target languages. The results highlight the importance of linguistic proximity and dataset characteristics when designing transfer strategies for historical NLP. These findings provide empirical insights into the applicability of modern neural methods to medieval text processing and provide practical guidance for deploying LLM-based POS tagging pipelines in digital humanities research. All code, models, and processed datasets are released for reproducibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes