CLOct 18, 2022

Tencent's Multilingual Machine Translation System for WMT22 Large-Scale African Languages

Peking UTencent
arXiv:2210.09644v1296 citationsh-index: 48Has Code
Originality Synthesis-oriented
AI Analysis

This work improves translation for under-resourced African languages, though it is incremental as it builds on existing methods within a competition framework.

The paper tackled the challenge of multilingual machine translation for African languages in the WMT22 shared task, addressing issues like data absence and imbalance, and achieved first place on blind test sets using data augmentation, robust optimization, and language grouping.

This paper describes Tencent's multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages. We participated in the $\mathbf{constrained}$ translation track in which only the data and pretrained models provided by the organizer are allowed. The task is challenging due to three problems, including the absence of training data for some to-be-evaluated language pairs, the uneven optimization of language pairs caused by data imbalance, and the curse of multilinguality. To address these problems, we adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models. Our submissions won the $\mathbf{1st\ place}$ on the blind test sets in terms of the automatic evaluation metrics. Codes, models, and detailed competition results are available at https://github.com/wxjiao/WMT2022-Large-Scale-African.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes