CL AI LGMar 17, 2019

The Missing Ingredient in Zero-Shot Neural Machine Translation

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey

arXiv:1903.07091v19.0121 citations

Originality Highly original

AI Analysis

This addresses the challenge of translating between unseen language pairs in multilingual NMT, which is incremental but improves scalability and efficiency.

The paper tackled the problem of zero-shot translation in multilingual neural machine translation by diagnosing why existing models fail and proposing auxiliary losses for representational invariance, achieving zero-shot performance on par with pivoting on WMT14 English-French-German.

Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.

View on arXiv PDF

Similar