LGCVJul 20, 2021

Follow Your Path: a Progressive Method for Knowledge Distillation

arXiv:2107.09305v118 citations
Originality Incremental advance
AI Analysis

This addresses the deployment of deep neural networks in resource-limited scenarios by improving knowledge distillation, though it appears incremental as it builds on existing distillation techniques.

The paper tackles the problem of knowledge distillation being constrained by converged teacher models, leading to poor local optima, by proposing ProKT, a progressive method that projects supervision signals into the student's parameter space, achieving superior performance on image and text datasets.

Deep neural networks often have a huge number of parameters, which posts challenges in deployment in application scenarios with limited memory and computation capacity. Knowledge distillation is one approach to derive compact models from bigger ones. However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima. In this paper, we propose ProKT, a new model-agnostic method by projecting the supervision signals of a teacher model into the student's parameter space. Such projection is implemented by decomposing the training objective into local intermediate targets with an approximate mirror descent technique. The proposed method could be less sensitive with the quirks during optimization which could result in a better local optimum. Experiments on both image and text datasets show that our proposed ProKT consistently achieves superior performance compared to other existing knowledge distillation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes