CLApr 18, 2021

Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders

arXiv:2104.08757v2666 citations
AI Analysis

This work addresses the challenge of enabling NMT models to translate between language pairs not seen during training, which is important for improving translation efficiency and accessibility in multilingual settings, though it is incremental as it builds on existing pretrained encoders.

The paper tackles the problem of zero-shot cross-lingual transfer in neural machine translation by proposing SixT, a model that leverages a multilingual pretrained encoder with a two-stage training schedule and architectural enhancements, resulting in an average improvement of 7.1 BLEU over mBART on zero-shot any-to-English test sets across 14 source languages.

Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the-shelf MPE, then it is directly tested on zero-shot language pairs. We propose SixT, a simple yet effective model for this task. SixT leverages the MPE with a two-stage training schedule and gets further improvement with a position disentangled encoder and a capacity-enhanced decoder. Using this method, SixT significantly outperforms mBART, a pretrained multilingual encoder-decoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages. Furthermore, with much less training computation cost and training data, our model achieves better performance on 15 any-to-English test sets than CRISS and m2m-100, two strong multilingual NMT baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes