CLMay 20, 2023

Accurate Knowledge Distillation with n-best Reranking

arXiv:2305.12057v41.74 citations

Originality Incremental advance

AI Analysis

This work addresses the efficiency-accuracy trade-off in machine translation for practitioners needing compact models.

The paper tackles the problem of improving sequence-level knowledge distillation for machine translation by using n-best reranking to select higher-quality pseudo-labels from multiple models, resulting in a student model that achieves comparable accuracy to a 4.7-billion-parameter model while having two orders of magnitude fewer parameters.

We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model's training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal is validated through experiments on the WMT'21 German-English and Chinese-English translation tasks. Our results demonstrate that utilizing pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model. In fact, our best student model achieves comparable accuracy to a large translation model from (Tran et al., 2021) with 4.7 billion parameters, while having two orders of magnitude fewer parameters.

View on arXiv PDF

Similar