CLApr 19, 2024

TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages

arXiv:2404.12845v226.0102 citationsh-index: 3SIGTYP

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of applying modern NLP tools to historical languages, though it is incremental as it builds on existing adapter methods.

The authors tackled the problem of adapting pre-trained language models to ancient and historical languages for tasks like morphological annotation and POS-tagging, achieving second place overall and first in word-level gap-filling in a shared task.

We present our submission to the unconstrained subtask of the SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages for morphological annotation, POS-tagging, lemmatization, character- and word-level gap-filling. We developed a simple, uniform, and computationally lightweight approach based on the adapters framework using parameter-efficient fine-tuning. We applied the same adapter-based approach uniformly to all tasks and 16 languages by fine-tuning stacked language- and task-specific adapters. Our submission obtained an overall second place out of three submissions, with the first place in word-level gap-filling. Our results show the feasibility of adapting language models pre-trained on modern languages to historical and ancient languages via adapter training.

View on arXiv PDF

Similar