CLAIMay 27, 2021

Selective Knowledge Distillation for Neural Machine Translation

arXiv:2105.12967v1728 citations
Originality Incremental advance
AI Analysis

This addresses the efficiency and performance of knowledge distillation for machine translation practitioners, offering incremental improvements by optimizing sample selection.

The paper tackled the problem that not all samples are equally beneficial for knowledge distillation in neural machine translation, finding that some can even hurt performance, and proposed selective strategies that improved BLEU scores by up to +1.28 and +0.89 points on two translation tasks.

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research field in NMT, knowledge distillation is widely applied to enhance the model's performance by transferring teacher model's knowledge on each training sample. However, previous work rarely discusses the different impacts and connections among these samples, which serve as the medium for transferring teacher knowledge. In this paper, we design a novel protocol that can effectively analyze the different impacts of samples by comparing various samples' partitions. Based on above protocol, we conduct extensive experiments and find that the teacher's knowledge is not the more, the better. Knowledge over specific samples may even hurt the whole performance of knowledge distillation. Finally, to address these issues, we propose two simple yet effective strategies, i.e., batch-level and global-level selections, to pick suitable samples for distillation. We evaluate our approaches on two large-scale machine translation tasks, WMT'14 English->German and WMT'19 Chinese->English. Experimental results show that our approaches yield up to +1.28 and +0.89 BLEU points improvements over the Transformer baseline, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes