LGAIJul 22, 2024

Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures

arXiv:2407.16040v21 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the impracticality of customizing teacher-student pairs for model compression across diverse hardware, offering a more efficient solution for deployment.

The paper tackles the problem of knowledge distillation being limited by architectural gaps between teacher and student models, proposing a generic teacher network that improves effectiveness and amortizes training costs across multiple student architectures.

Knowledge distillation (KD) is a model compression method that entails training a compact student model to emulate the performance of a more complex teacher model. However, the architectural capacity gap between the two models limits the effectiveness of knowledge transfer. Addressing this issue, previous works focused on customizing teacher-student pairs to improve compatibility, a computationally expensive process that needs to be repeated every time either model changes. Hence, these methods are impractical when a teacher model has to be compressed into different student models for deployment on multiple hardware devices with distinct resource constraints. In this work, we propose Generic Teacher Network (GTN), a one-off KD-aware training to create a generic teacher capable of effectively transferring knowledge to any student model sampled from a given finite pool of architectures. To this end, we represent the student pool as a weight-sharing supernet and condition our generic teacher to align with the capacities of various student architectures sampled from this supernet. Experimental evaluation shows that our method both improves overall KD effectiveness and amortizes the minimal additional training cost of the generic teacher across students in the pool.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes