CLMay 14, 2024

LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages

arXiv:2405.08997v229 citationsh-index: 6AMERICASNLP
Originality Highly original
AI Analysis

This addresses the problem of machine translation for critically endangered languages with no available data, enabling language education and revitalization efforts.

The authors tackled machine translation for no-resource languages by proposing LLM-RBMT, a new paradigm combining rule-based methods with large language models, and applied it to create the first translator for Owens Valley Paiute, an endangered Indigenous language with no public data.

We propose a new paradigm for machine translation that is particularly useful for no-resource languages (those without any publicly available bilingual or monolingual corpora): LLM-RBMT (LLM-Assisted Rule Based Machine Translation). Using the LLM-RBMT paradigm, we design the first language education/revitalization-oriented machine translator for Owens Valley Paiute (OVP), a critically endangered Indigenous American language for which there is virtually no publicly available data. We present a detailed evaluation of the translator's components: a rule-based sentence builder, an OVP to English translator, and an English to OVP translator. We also discuss the potential of the paradigm, its limitations, and the many avenues for future research that it opens up.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes