CLAIDec 5, 2020

Reciprocal Supervised Learning Improves Neural Machine Translation

arXiv:2012.02975v1
AI Analysis

This work addresses the problem of improving neural machine translation accuracy for researchers and practitioners by leveraging multiple models more effectively than traditional self-training or knowledge distillation.

Self-training has seen limited success in neural machine translation (NMT) due to the compositionality of the target space, which can lead to reinforced mistakes. This paper introduces Reciprocal-Supervised Learning (RSL), a method that uses multiple diverse models to generate pseudo-parallel data and then cooperatively trains each model on this combined synthetic corpus, demonstrating superior performance on several benchmarks.

Despite the recent success on image classification, self-training has only achieved limited gains on structured prediction tasks such as neural machine translation (NMT). This is mainly due to the compositionality of the target space, where the far-away prediction hypotheses lead to the notorious reinforced mistake problem. In this paper, we revisit the utilization of multiple diverse models and present a simple yet effective approach named Reciprocal-Supervised Learning (RSL). RSL first exploits individual models to generate pseudo parallel data, and then cooperatively trains each model on the combined synthetic corpus. RSL leverages the fact that different parameterized models have different inductive biases, and better predictions can be made by jointly exploiting the agreement among each other. Unlike the previous knowledge distillation methods built upon a much stronger teacher, RSL is capable of boosting the accuracy of one model by introducing other comparable or even weaker models. RSL can also be viewed as a more efficient alternative to ensemble. Extensive experiments demonstrate the superior performance of RSL on several benchmarks with significant margins.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes