CVNov 19, 2018

Self-Referenced Deep Learning

arXiv:1811.07598v126 citations
Originality Incremental advance
AI Analysis

This addresses the problem of high computational overhead in knowledge distillation for practitioners needing efficient model deployment, though it is incremental as it builds on existing distillation methods.

The paper tackles the computational cost of training a separate teacher model in knowledge distillation by introducing Self-Referenced Deep Learning (SRDL), which distills knowledge from the in-training model back to itself, resulting in improved generalization performance with negligible extra cost across multiple datasets.

Knowledge distillation is an effective approach to transferring knowledge from a teacher neural network to a student target network for satisfying the low-memory and fast running requirements in practice use. Whilst being able to create stronger target networks compared to the vanilla non-teacher based learning strategy, this scheme needs to train additionally a large teacher model with expensive computational cost. In this work, we present a Self-Referenced Deep Learning (SRDL) strategy. Unlike both vanilla optimisation and existing knowledge distillation, SRDL distils the knowledge discovered by the in-training target model back to itself to regularise the subsequent learning procedure therefore eliminating the need for training a large teacher model. SRDL improves the model generalisation performance compared to vanilla learning and conventional knowledge distillation approaches with negligible extra computational cost. Extensive evaluations show that a variety of deep networks benefit from SRDL resulting in enhanced deployment performance on both coarse-grained object categorisation tasks (CIFAR10, CIFAR100, Tiny ImageNet, and ImageNet) and fine-grained person instance identification tasks (Market-1501).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes