LGMLApr 10, 2023

A Survey on Recent Teacher-student Learning Studies

arXiv:2304.04615v14 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

It provides an overview of incremental improvements in knowledge distillation methods for researchers in model compression and efficiency.

This survey reviews recent variants of knowledge distillation, such as teaching assistant and curriculum distillation, which aim to enhance the performance of transferring knowledge from complex to smaller neural networks, showing promising results in improving accuracy.

Knowledge distillation is a method of transferring the knowledge from a complex deep neural network (DNN) to a smaller and faster DNN, while preserving its accuracy. Recent variants of knowledge distillation include teaching assistant distillation, curriculum distillation, mask distillation, and decoupling distillation, which aim to improve the performance of knowledge distillation by introducing additional components or by changing the learning process. Teaching assistant distillation involves an intermediate model called the teaching assistant, while curriculum distillation follows a curriculum similar to human education. Mask distillation focuses on transferring the attention mechanism learned by the teacher, and decoupling distillation decouples the distillation loss from the task loss. Overall, these variants of knowledge distillation have shown promising results in improving the performance of knowledge distillation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes