LGMLDec 5, 2018

Few Sample Knowledge Distillation for Efficient Network Compression

arXiv:1812.01839v348 citations
Originality Incremental advance
AI Analysis

This provides an efficient solution for network compression in resource-constrained settings, though it is incremental as it builds on existing knowledge distillation and compression techniques.

The paper tackles the problem of recovering prediction accuracy in highly compressed neural networks without requiring large labeled datasets or lengthy fine-tuning. It achieves this by using knowledge distillation from label-free few samples, recovering student-net accuracy to conventional fine-tuning levels in minutes with only 1% of the data.

Deep neural network compression techniques such as pruning and weight tensor decomposition usually require fine-tuning to recover the prediction accuracy when the compression ratio is high. However, conventional fine-tuning suffers from the requirement of a large training set and the time-consuming training procedure. This paper proposes a novel solution for knowledge distillation from label-free few samples to realize both data efficiency and training/processing efficiency. We treat the original network as "teacher-net" and the compressed network as "student-net". A 1x1 convolution layer is added at the end of each layer block of the student-net, and we fit the block-level outputs of the student-net to the teacher-net by estimating the parameters of the added layers. We prove that the added layer can be merged without adding extra parameters and computation cost during inference. Experiments on multiple datasets and network architectures verify the method's effectiveness on student-nets obtained by various network pruning and weight decomposition methods. Our method can recover student-net's accuracy to the same level as conventional fine-tuning methods in minutes while using only 1% label-free data of the full training data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes