CVNov 15, 2020

Online Ensemble Model Compression using Knowledge Distillation

arXiv:2011.07449v158 citations
AI Analysis

This addresses the problem of efficient model deployment for resource-constrained applications, though it is incremental over existing knowledge distillation methods.

The paper tackles model compression by proposing a knowledge distillation framework that trains an ensemble teacher and multiple compressed student models simultaneously, achieving a 10.64% relative accuracy gain for a 97% compressed ResNet110 on CIFAR100.

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each model learns unique representations from the data distribution due to its distinct architecture. This helps the ensemble generalize better by combining every model's knowledge. The distilled students and ensemble teacher are trained simultaneously without requiring any pretrained weights. Moreover, our proposed method can deliver multi-compressed students with single training, which is efficient and flexible for different scenarios. We provide comprehensive experiments using state-of-the-art classification models to validate our framework's effectiveness. Notably, using our framework a 97% compressed ResNet110 student model managed to produce a 10.64% relative accuracy gain over its individual baseline training on CIFAR100 dataset. Similarly a 95% compressed DenseNet-BC(k=12) model managed a 8.17% relative accuracy gain.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes