CLMay 25, 2023

Towards Higher Pareto Frontier in Multilingual Machine Translation

Yichong Huang, Xiaocheng Feng, Xinwei Geng, Baohang Li, Bing Qin

arXiv:2305.15718v126.5224 citationsHas Code

Originality Highly original

AI Analysis

This addresses the problem of balancing performance across languages in multilingual translation for users of such systems, representing an incremental advance over existing balancing strategies.

The paper tackles the Pareto optimization challenge in multilingual machine translation, where improving some languages can degrade others, by proposing Pareto Mutual Distillation (Pareto-MD) to push the Pareto frontier outward, achieving up to +2.46 BLEU improvement over baselines.

Multilingual neural machine translation has witnessed remarkable progress in recent years. However, the long-tailed distribution of multilingual corpora poses a challenge of Pareto optimization, i.e., optimizing for some languages may come at the cost of degrading the performance of others. Existing balancing training strategies are equivalent to a series of Pareto optimal solutions, which trade off on a Pareto frontier. In this work, we propose a new training framework, Pareto Mutual Distillation (Pareto-MD), towards pushing the Pareto frontier outwards rather than making trade-offs. Specifically, Pareto-MD collaboratively trains two Pareto optimal solutions that favor different languages and allows them to learn from the strengths of each other via knowledge distillation. Furthermore, we introduce a novel strategy to enable stronger communication between Pareto optimal solutions and broaden the applicability of our approach. Experimental results on the widely-used WMT and TED datasets show that our method significantly pushes the Pareto frontier and outperforms baselines by up to +2.46 BLEU.

View on arXiv PDF Code

Similar