CVAug 16, 2020

Cascaded channel pruning using hierarchical self-distillation

arXiv:2008.06814v18 citations
Originality Incremental advance
AI Analysis

This work addresses model compression for deep learning applications, offering incremental improvements in pruning efficiency.

The paper tackles filter-level pruning for neural networks by introducing a hierarchical self-distillation method with teaching assistants, achieving improved accuracy and model compression on CIFAR10 and ImageNet tasks using VGG16 and ResNet50 architectures.

In this paper, we propose an approach for filter-level pruning with hierarchical knowledge distillation based on the teacher, teaching-assistant, and student framework. Our method makes use of teaching assistants at intermediate pruning levels that share the same architecture and weights as the target student. We propose to prune each model independently using the gradient information from its corresponding teacher. By considering the relative sizes of each student-teacher pair, this formulation provides a natural trade-off between the capacity gap for knowledge distillation and the bias of the filter saliency updates. Our results show improvements in the attainable accuracy and model compression across the CIFAR10 and ImageNet classification tasks using the VGG16and ResNet50 architectures. We provide an extensive evaluation that demonstrates the benefits of using a varying number of teaching assistant models at different sizes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes