CLMay 23, 2022

Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore

arXiv:2205.11370v2583 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of digitizing and standardizing historical Gaelic texts for linguists and historians, though it is incremental as it applies existing methods to a new dataset.

The paper tackled the problem of transliterating a 16th-century Scottish Gaelic manuscript into standardized orthography using Transformer-based models, achieving a character-level BLEU score of 54.15 with a BART model fine-tuned on 2,000 word-level examples.

The Book of the Dean of Lismore (BDL) is a 16th-century Scottish Gaelic manuscript written in a non-standard orthography. In this work, we outline the problem of transliterating the text of the BDL into a standardised orthography, and perform exploratory experiments using Transformer-based models for this task. In particular, we focus on the task of word-level transliteration, and achieve a character-level BLEU score of 54.15 with our best model, a BART architecture pre-trained on the text of Scottish Gaelic Wikipedia and then fine-tuned on around 2,000 word-level parallel examples. Our initial experiments give promising results, but we highlight the shortcomings of our model, and discuss directions for future work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes