Harmonizing knowledge Transfer in Neural Network with Unified Distillation
This work addresses the challenge of harmonizing knowledge transfer for model compression and efficiency, though it is incremental in advancing existing distillation techniques.
The paper tackles the problem of knowledge transfer in neural networks by introducing a unified distillation framework that aggregates intermediate features into a comprehensive representation and uses distribution constraints, achieving improved performance on benchmark datasets with gains of up to 2.5% in accuracy over baseline methods.
Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers' features, and logits-based, targeting the final layer's logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, effectively gathering semantic information from different stages and scales. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, thereby allowing for knowledge distillation through a unified distribution constraint at different stages of the network, ensuring the comprehensiveness and coherence of knowledge transfer. Numerous experiments were conducted to validate the effectiveness of the proposed method.