CVLGNEJul 5, 2017

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

arXiv:1707.01219v2196 citations
Originality Incremental advance
AI Analysis

This addresses the high storage and computational costs of deep neural networks for applications requiring efficient models, representing an incremental improvement in knowledge transfer techniques.

The paper tackles the problem of compressing and accelerating deep neural networks by proposing a knowledge transfer method that matches neuron selectivity patterns between teacher and student networks using Maximum Mean Discrepancy, resulting in significantly improved student network performance validated across multiple datasets and tasks like object detection.

Despite deep neural networks have demonstrated extraordinary power in various applications, their superior performances are at expense of high storage and computational costs. Consequently, the acceleration and compression of neural networks have attracted much attention recently. Knowledge Transfer (KT), which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the popular solutions. In this paper, we propose a novel knowledge transfer method by treating it as a distribution matching problem. Particularly, we match the distributions of neuron selectivity patterns between teacher and student networks. To achieve this goal, we devise a new KT loss function by minimizing the Maximum Mean Discrepancy (MMD) metric between these distributions. Combined with the original loss function, our method can significantly improve the performance of student networks. We validate the effectiveness of our method across several datasets, and further combine it with other KT methods to explore the best possible results. Last but not least, we fine-tune the model to other tasks such as object detection. The results are also encouraging, which confirm the transferability of the learned features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes