CVMay 18, 2018

Recurrent knowledge distillation

arXiv:1805.07170v11 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more compact neural networks in resource-constrained applications, though it is incremental as it builds on existing knowledge distillation and residual layer techniques.

The authors tackled the problem of reducing the size of student networks in knowledge distillation by recasting multiple residual layers from the teacher into a single recurrent student layer, achieving parameter reduction with minimal accuracy loss on datasets like CIFAR-10, Scenes, and MiniPlaces.

Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes