CLOct 21, 2022

University of Cape Town's WMT22 System: Multilingual Machine Translation for Southern African Languages

arXiv:2210.11757v1290 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited translation resources for Southern African languages, which is incremental as it applies established low-resource techniques to a new multilingual dataset.

The paper tackled machine translation for low-resource Southern African languages by developing a multilingual model for English and 8 languages, using techniques like back-translation and synthetic data to improve performance, especially in data-scarce directions.

The paper describes the University of Cape Town's submission to the constrained track of the WMT22 Shared Task: Large-Scale Machine Translation Evaluation for African Languages. Our system is a single multilingual translation model that translates between English and 8 South / South East African Languages, as well as between specific pairs of the African languages. We used several techniques suited for low-resource machine translation (MT), including overlap BPE, back-translation, synthetic training data generation, and adding more translation directions during training. Our results show the value of these techniques, especially for directions where very little or no bilingual training data is available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes