CLAIJul 27, 2025

AI-Driven Generation of Old English: A Framework for Low-Resource Languages

arXiv:2507.20111v1
Originality Incremental advance
AI Analysis

This work addresses the preservation of endangered languages like Old English for cultural and linguistic heritage, offering a scalable blueprint for other low-resource languages.

The paper tackles the problem of generating Old English texts, a low-resource language, by developing a framework that uses large language models with parameter-efficient fine-tuning and data augmentation, achieving a BLEU score increase from 26 to over 65 for translation.

Preserving ancient languages is essential for understanding humanity's cultural and linguistic heritage, yet Old English remains critically under-resourced, limiting its accessibility to modern natural language processing (NLP) techniques. We present a scalable framework that uses advanced large language models (LLMs) to generate high-quality Old English texts, addressing this gap. Our approach combines parameter-efficient fine-tuning (Low-Rank Adaptation, LoRA), data augmentation via backtranslation, and a dual-agent pipeline that separates the tasks of content generation (in English) and translation (into Old English). Evaluation with automated metrics (BLEU, METEOR, and CHRF) shows significant improvements over baseline models, with BLEU scores increasing from 26 to over 65 for English-to-Old English translation. Expert human assessment also confirms high grammatical accuracy and stylistic fidelity in the generated texts. Beyond expanding the Old English corpus, our method offers a practical blueprint for revitalizing other endangered languages, effectively uniting AI innovation with the goals of cultural preservation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes