CVAILGDec 9, 2020

Progressive Network Grafting for Few-Shot Knowledge Distillation

arXiv:2012.04915v255 citationsHas Code
AI Analysis

This work tackles the problem of reducing the data requirements for knowledge distillation, which is a significant cost-saving for practitioners in deep model compression.

This paper addresses few-shot knowledge distillation, where only a few unlabeled samples per category are available for model compression. The authors introduce a dual-stage distillation scheme that achieves performance on par with full-dataset knowledge distillation methods on CIFAR10 and CIFAR100, and gratifying results on ILSVRC-2012.

Knowledge distillation has demonstrated encouraging performances in deep model compression. Most existing approaches, however, require massive labeled data to accomplish the knowledge transfer, making the model compression a cumbersome and costly process. In this paper, we investigate the practical few-shot knowledge distillation scenario, where we assume only a few samples without human annotations are available for each category. To this end, we introduce a principled dual-stage distillation scheme tailored for few-shot data. In the first step, we graft the student blocks one by one onto the teacher, and learn the parameters of the grafted block intertwined with those of the other teacher blocks. In the second step, the trained student blocks are progressively connected and then together grafted onto the teacher network, allowing the learned student blocks to adapt themselves to each other and eventually replace the teacher network. Experiments demonstrate that our approach, with only a few unlabeled samples, achieves gratifying results on CIFAR10, CIFAR100, and ILSVRC-2012. On CIFAR10 and CIFAR100, our performances are even on par with those of knowledge distillation schemes that utilize the full datasets. The source code is available at https://github.com/zju-vipa/NetGraft.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes