CL LGOct 21, 2020

Beyond English-Centric Multilingual Machine Translation

Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch

arXiv:2010.11125v122.71075 citationsHas Code

Originality Highly original

AI Analysis

This addresses the global translation needs by enabling direct translation between non-English language pairs, moving beyond incremental improvements to reduce reliance on English as a pivot.

The authors tackled the problem of English-centric bias in multilingual machine translation by creating a many-to-many model that translates directly between any pair of 100 languages, achieving gains of over 10 BLEU for non-English directions while remaining competitive with top WMT systems.

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.

View on arXiv PDF Code

Similar