CLJun 4, 2019

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li

arXiv:1906.01181v131.71147 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of zero-shot translation for multilingual NLP systems, offering a practical solution to improve performance without extensive training data.

The paper tackled the degeneracy problem in zero-shot neural machine translation by analyzing spurious correlations and proposing decoder pre-training and back-translation, resulting in improvements of 4-22 BLEU points over vanilla methods and matching or exceeding pivot-based approaches.

Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.

View on arXiv PDF

Similar