CLIRMay 6, 2024

GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

arXiv:2405.03764v223 citationsEMNLP
Originality Incremental advance
AI Analysis

This addresses a practical deployment challenge for question-answering systems by enabling efficient knowledge distillation without labels, though it is incremental in improving existing distillation methods.

The paper tackles the problem of effectively ensembling knowledge from multiple teacher models during unsupervised distillation for question-answering systems, achieving results where the student model matches teacher ensemble performance with only 1% of the inference budget.

Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. However, for practical deployment, it is crucial to perform knowledge distillation to maintain high performance while operating under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student model performance, how can knowledge from multiple teacher models be effectively ensemble during this stage without the guidance of labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN has demonstrated significant improvements in both offline and online experiments, enabling the student model to achieve results comparable to that of teacher ensembles. Our experiments show that GOVERN remarkably requires a mere 1\% of the ensemble method's inference budget to achieve 99.5\% of performance. The proposed algorithm has been successfully deployed in a real-world commercial question-answering system, demonstrating its real-world applicability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes