CVMay 18, 2018

Recurrent knowledge distillation

Silvia L. Pintea, Yue Liu, Jan C. van Gemert

arXiv:1805.07170v11.71 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more compact neural networks in resource-constrained applications, though it is incremental as it builds on existing knowledge distillation and residual layer techniques.

The authors tackled the problem of reducing the size of student networks in knowledge distillation by recasting multiple residual layers from the teacher into a single recurrent student layer, achieving parameter reduction with minimal accuracy loss on datasets like CIFAR-10, Scenes, and MiniPlaces.

Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.

View on arXiv PDF

Similar