CLAICYApr 8, 2022

MMTAfrica: Multilingual Machine Translation for African Languages

MILA
arXiv:2204.04306v1652 citationsh-index: 18Has Code
Originality Incremental advance
AI Analysis

It addresses the problem of low-resource translation for African languages, which is incremental as it builds on existing multilingual frameworks.

The paper tackles multilingual machine translation for African languages by introducing MMTAfrica, a many-to-many system for six African and two non-African languages, achieving spBLEU gains from +0.58 to +19.46 on FLORES 101 benchmarks.

In this paper, we focus on the task of multilingual machine translation for African languages and describe our contribution in the 2021 WMT Shared Task: Large-Scale Multilingual Machine Translation. We introduce MMTAfrica, the first many-to-many multilingual translation system for six African languages: Fon (fon), Igbo (ibo), Kinyarwanda (kin), Swahili/Kiswahili (swa), Xhosa (xho), and Yoruba (yor) and two non-African languages: English (eng) and French (fra). For multilingual translation concerning African languages, we introduce a novel backtranslation and reconstruction objective, BT\&REC, inspired by the random online back translation and T5 modeling framework respectively, to effectively leverage monolingual data. Additionally, we report improvements from MMTAfrica over the FLORES 101 benchmarks (spBLEU gains ranging from $+0.58$ in Swahili to French to $+19.46$ in French to Xhosa). We release our dataset and code source at https://github.com/edaiofficial/mmtafrica.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes