CLLGMay 22, 2024

Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages

arXiv:2405.13350v21 citationsh-index: 13LatinX in AI at North American Chapter of the Association for Computational Linguistics Conference 2024
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited access to sacred texts for speakers of underrepresented languages, though it is incremental in applying an existing method to new data.

The study tackled translating the Bible into underrepresented languages using a ByT5-based model, achieving improved accessibility as measured by BLEU scores and sample translations.

This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented with sample translations, suggest the model can improve accessibility to sacred texts. It effectively handles the distinctive biblical lexicon and structure, thus bridging the linguistic divide. The study also discusses the model's limitations and suggests pathways for future enhancements, focusing on expanding access to sacred literature across linguistic boundaries.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes