CLOct 18, 2022

Tencent's Multilingual Machine Translation System for WMT22 Large-Scale African Languages

Wenxiang Jiao, Zhaopeng Tu, Jiarui Li, Wenxuan Wang, Jen-tse Huang, Shuming Shi

Peking UTencent

arXiv:2210.09644v124.2296 citationsh-index: 48Has Code

Originality Synthesis-oriented

AI Analysis

This work improves translation for under-resourced African languages, though it is incremental as it builds on existing methods within a competition framework.

The paper tackled the challenge of multilingual machine translation for African languages in the WMT22 shared task, addressing issues like data absence and imbalance, and achieved first place on blind test sets using data augmentation, robust optimization, and language grouping.

This paper describes Tencent's multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages. We participated in the $\mathbf{constrained}$ translation track in which only the data and pretrained models provided by the organizer are allowed. The task is challenging due to three problems, including the absence of training data for some to-be-evaluated language pairs, the uneven optimization of language pairs caused by data imbalance, and the curse of multilinguality. To address these problems, we adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models. Our submissions won the $\mathbf{1st\ place}$ on the blind test sets in terms of the automatic evaluation metrics. Codes, models, and detailed competition results are available at https://github.com/wxjiao/WMT2022-Large-Scale-African.

View on arXiv PDF Code

Similar