CVLGApr 10, 2019

Relational Knowledge Distillation

arXiv:1904.05068v21875 citations
AI Analysis

This addresses the problem of efficient knowledge transfer in machine learning, particularly for model compression and metric learning, with incremental novelty in method design.

The paper tackles knowledge transfer from teacher to student models by proposing relational knowledge distillation (RKD), which focuses on mutual relations of data examples rather than individual outputs, resulting in significant performance improvements, including students outperforming teachers and achieving state-of-the-art results on benchmark datasets.

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes