CLSep 14, 2021

Efficient Inference for Multilingual Neural Machine Translation

arXiv:2109.06679v2664 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency for production deployment of multilingual NMT, but it is incremental as it builds on existing architectures.

The paper tackled the problem of slow inference in multilingual neural machine translation by proposing a shallow decoder with vocabulary filtering, achieving more than twice faster inference with no loss in translation quality as validated on 380 language pairs.

Multilingual NMT has become an attractive solution for MT deployment in production. But to match bilingual quality, it comes at the cost of larger and slower models. In this work, we consider several ways to make multilingual NMT faster at inference without degrading its quality. We experiment with several "light decoder" architectures in two 20-language multi-parallel settings: small-scale on TED Talks and large-scale on ParaCrawl. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality. We validate our findings with BLEU and chrF (on 380 language pairs), robustness evaluation and human evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes