CLLGApr 1, 2021

Low-Resource Neural Machine Translation for Southern African Languages

arXiv:2104.00366v218 citations
AI Analysis

This work addresses the problem of limited translation data for low-resource African languages, which is incremental as it applies existing methods to new language pairs.

The paper tackled low-resource neural machine translation for Southern African Bantu languages by comparing zero-shot, transfer, and multilingual learning, showing that multilingual learning achieved the best results with BLEU score improvements of up to 9.9 over the baseline and over 10 over previous SOTA.

Low-resource African languages have not fully benefited from the progress in neural machine translation because of a lack of data. Motivated by this challenge we compare zero-shot learning, transfer learning and multilingual learning on three Bantu languages (Shona, isiXhosa and isiZulu) and English. Our main target is English-to-isiZulu translation for which we have just 30,000 sentence pairs, 28% of the average size of our other corpora. We show the importance of language similarity on the performance of English-to-isiZulu transfer learning based on English-to-isiXhosa and English-to-Shona parent models whose BLEU scores differ by 5.2. We then demonstrate that multilingual learning surpasses both transfer learning and zero-shot learning on our dataset, with BLEU score improvements relative to the baseline English-to-isiZulu model of 9.9, 6.1 and 2.0 respectively. Our best model also improves the previous SOTA BLEU score by more than 10.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes