LGAICVDec 3, 2018

Knowledge Distillation with Feature Maps for Image Classification

arXiv:1812.00660v146 citations
Originality Incremental advance
AI Analysis

This addresses the model deployment problem for practitioners needing efficient deep learning models, but it is incremental as it builds on existing knowledge distillation techniques.

The paper tackles the problem of reducing computation costs and latency in deep learning models for image classification by proposing KDFM, a knowledge distillation method that uses feature maps, a shared classifier, and a generative adversarial network. The result is that student models like a 4-layer CNN and MobileNet mimic DenseNet-40 and DenseNet-100 with less than 1% accuracy loss on CIFAR-100, achieving 2-6 times faster inference and reduced model sizes.

The model reduction problem that eases the computation costs and latency of complex deep learning architectures has received an increasing number of investigations owing to its importance in model deployment. One promising method is knowledge distillation (KD), which creates a fast-to-execute student model to mimic a large teacher network. In this paper, we propose a method, called KDFM (Knowledge Distillation with Feature Maps), which improves the effectiveness of KD by learning the feature maps from the teacher network. Two major techniques used in KDFM are shared classifier and generative adversarial network. Experimental results show that KDFM can use a four layers CNN to mimic DenseNet-40 and use MobileNet to mimic DenseNet-100. Both student networks have less than 1\% accuracy loss comparing to their teacher models for CIFAR-100 datasets. The student networks are 2-6 times faster than their teacher models for inference, and the model size of MobileNet is less than half of DenseNet-100's.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes