From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation
It addresses the challenge of improving translation systems for under-resourced African languages, which is incremental as it applies known techniques to a specific domain.
This study tackled the problem of low-resource machine translation for African languages by applying data augmentation techniques, resulting in a minimum 25% increase in BLEU scores across six languages.
The linguistic diversity across the African continent presents different challenges and opportunities for machine translation. This study explores the effects of data augmentation techniques in improving translation systems in low-resource African languages. We focus on two data augmentation techniques: sentence concatenation with back translation and switch-out, applying them across six African languages. Our experiments show significant improvements in machine translation performance, with a minimum increase of 25\% in BLEU score across all six languages. We provide a comprehensive analysis and highlight the potential of these techniques to improve machine translation systems for low-resource languages, contributing to the development of more robust translation systems for under-resourced languages.