CLApr 24, 2020

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

arXiv:2004.11867v11086 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient multilingual translation for users needing broad language support, but it is incremental as it builds on existing multilingual NMT approaches.

The paper tackled the underperformance of massively multilingual neural machine translation models compared to bilingual ones and poor zero-shot translation by addressing capacity bottlenecks and off-target issues, resulting in narrowing the performance gap and improving zero-shot performance by ~10 BLEU.

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both one-to-many and many-to-many settings, and improves zero-shot performance by ~10 BLEU, approaching conventional pivot-based methods.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes