CLMay 3, 2022

Unifying the Convergences in Multilingual Neural Machine Translation

Yichong Huang, Xiaocheng Feng, Xinwei Geng, Bing Qin

arXiv:2205.01620v224.0293 citationsh-index: 47Has Code

Originality Incremental advance

AI Analysis

This addresses a key bottleneck in multilingual NMT by improving training efficiency and performance for both low- and high-resource languages, though it is an incremental advancement over existing methods.

The paper tackles the convergence inconsistency problem in multilingual neural machine translation, where different language pairs converge at different epochs, causing over-fitting for low-resource languages and under-fitting for high-resource ones. It proposes a Language-Specific Self-Distillation (LSSD) training strategy, which achieves state-of-the-art performance with consistent improvements across all language pairs on three datasets.

Although all-in-one-model multilingual neural machine translation (multilingual NMT) has achieved remarkable progress, the convergence inconsistency in the joint training is ignored, i.e., different language pairs reaching convergence in different epochs. This leads to the trained MNMT model over-fitting low-resource language translations while under-fitting high-resource ones. In this paper, we propose a novel training strategy named LSSD (Language-Specific Self-Distillation), which can alleviate the convergence inconsistency and help MNMT models achieve the best performance on each language pair simultaneously. Specifically, LSSD picks up language-specific best checkpoints for each language pair to teach the current model on the fly. Furthermore, we systematically explore three sample-level manipulations of knowledge transferring. Experimental results on three datasets show that LSSD obtains consistent improvements towards all language pairs and achieves the state-of-the-art.

View on arXiv PDF Code

Similar