CLMay 5, 2025

Data Augmentation With Back translation for Low Resource languages: A case of English and Luganda

arXiv:2505.02463v14 citationsh-index: 6NLPIR
Originality Incremental advance
AI Analysis

This work addresses translation challenges for low-resource languages like Luganda, though it is incremental as it applies known techniques to a new language pair.

The paper tackled the problem of low-resource neural machine translation for English-Luganda by applying back translation to generate synthetic data, resulting in a performance improvement of over 10 BLEU score units across all translation directions.

In this paper,we explore the application of Back translation (BT) as a semi-supervised technique to enhance Neural Machine Translation(NMT) models for the English-Luganda language pair, specifically addressing the challenges faced by low-resource languages. The purpose of our study is to demonstrate how BT can mitigate the scarcity of bilingual data by generating synthetic data from monolingual corpora. Our methodology involves developing custom NMT models using both publicly available and web-crawled data, and applying Iterative and Incremental Back translation techniques. We strategically select datasets for incremental back translation across multiple small datasets, which is a novel element of our approach. The results of our study show significant improvements, with translation performance for the English-Luganda pair exceeding previous benchmarks by more than 10 BLEU score units across all translation directions. Additionally, our evaluation incorporates comprehensive assessment metrics such as SacreBLEU, ChrF2, and TER, providing a nuanced understanding of translation quality. The conclusion drawn from our research confirms the efficacy of BT when strategically curated datasets are utilized, establishing new performance benchmarks and demonstrating the potential of BT in enhancing NMT models for low-resource languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes