CVMay 20, 2022

InDistill: Information flow-preserving knowledge distillation for model compression

arXiv:2205.10003v46 citationsh-index: 41Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient model deployment for practitioners by improving knowledge distillation, though it is incremental as it builds on existing KD methods.

The paper tackles model compression by introducing InDistill, a warmup stage for knowledge distillation that transfers critical information flow paths from a teacher to a student model, resulting in consistent performance increases over baseline KD approaches on datasets like CIFAR-10, CIFAR-100, and ImageNet.

In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. This is achieved via a training scheme based on curriculum learning that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation without the need for an encoding stage. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrating that preserving the information flow paths consistently increases the performance of the baseline KD approaches on both classification and retrieval settings. The code is available at https://github.com/gsarridis/InDistill.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes