CVAIJul 16, 2024

Relational Representation Distillation

arXiv:2407.12073v53 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the challenge of transferring knowledge from large to compact models in machine learning, offering an incremental improvement over existing distillation techniques.

The paper tackled the problem of knowledge distillation by proposing a method that preserves relative relationships between instances, which significantly outperforms existing methods across diverse tasks and sometimes even surpasses the teacher network.

Knowledge distillation involves transferring knowledge from large, cumbersome teacher models to more compact student models. The standard approach minimizes the Kullback-Leibler (KL) divergence between the probabilistic outputs of a teacher and student network. However, this approach fails to capture important structural relationships in the teacher's internal representations. Recent advances have turned to contrastive learning objectives, but these methods impose overly strict constraints through instance-discrimination, forcing apart semantically similar samples even when they should maintain similarity. This motivates an alternative objective by which we preserve relative relationships between instances. Our method employs separate temperature parameters for teacher and student distributions, with sharper student outputs, enabling precise learning of primary relationships while preserving secondary similarities. We show theoretical connections between our objective and both InfoNCE loss and KL divergence. Experiments demonstrate that our method significantly outperforms existing knowledge distillation methods across diverse knowledge transfer tasks, achieving better alignment with teacher models, and sometimes even outperforms the teacher network.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes