CVLGSep 17, 2019

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

arXiv:1909.08097v389 citations
AI Analysis

This work addresses the efficiency and generalization challenges in deep learning for computer vision, offering a method to create compact networks that maintain high performance, which is incremental as it builds on existing knowledge distillation and ensemble techniques.

The paper tackles the problem of high computational and memory costs in ensemble CNN models by proposing an Ensemble Knowledge Distillation (EKD) framework that trains a compact student network with parallel branches using knowledge from multiple teacher networks, resulting in improved classification accuracy and model generalization, especially with limited training data, and outperforming state-of-the-art knowledge distillation methods on CIFAR-10 and CIFAR-100 datasets.

Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for learning compact CNN models with improved classification performance and model generalization. For this, we propose a CNN architecture of a compact student model with parallel branches which are trained using ground truth labels and information from high capacity teacher networks in an ensemble learning fashion. Our framework provides two main benefits: i) Distilling knowledge from different teachers into the student network promotes heterogeneity in feature learning at different branches of the student network and enables the network to learn diverse solutions to the target problem. ii) Coupling the branches of the student network through ensembling encourages collaboration and improves the quality of the final predictions by reducing variance in the network outputs. Experiments on the well established CIFAR-10 and CIFAR-100 datasets show that our Ensemble Knowledge Distillation (EKD) improves classification accuracy and model generalization especially in situations with limited training data. Experiments also show that our EKD based compact networks outperform in terms of mean accuracy on the test datasets compared to state-of-the-art knowledge distillation based methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes