CVAug 15, 2021

Multi-granularity for knowledge distillation

arXiv:2108.06681v11 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving knowledge distillation for neural network compression, but it appears incremental as it builds on existing frameworks.

The paper tackles the problem of knowledge distillation by proposing a multi-granularity mechanism to transfer more understandable knowledge from teacher to student networks, resulting in an average accuracy improvement of 0.58% and up to 1.08% over baselines, with enhanced fine-tuning and robustness.

Considering the fact that students have different abilities to understand the knowledge imparted by teachers, a multi-granularity distillation mechanism is proposed for transferring more understandable knowledge for student networks. A multi-granularity self-analyzing module of the teacher network is designed, which enables the student network to learn knowledge from different teaching patterns. Furthermore, a stable excitation scheme is proposed for robust supervision for the student training. The proposed distillation mechanism can be embedded into different distillation frameworks, which are taken as baselines. Experiments show the mechanism improves the accuracy by 0.58% on average and by 1.08% in the best over the baselines, which makes its performance superior to the state-of-the-arts. It is also exploited that the student's ability of fine-tuning and robustness to noisy inputs can be improved via the proposed mechanism. The code is available at https://github.com/shaoeric/multi-granularity-distillation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes