LGCVFeb 4

REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency

arXiv:2602.04677v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses a bottleneck in knowledge distillation for machine learning practitioners by providing a more robust and efficient method, though it is incremental as it builds on existing distillation techniques.

The paper tackled the problem of unreliable teacher predictions in knowledge distillation by introducing REDistill, a robust framework using power divergence loss, which improved student accuracy on CIFAR-100 and ImageNet-1k without extensive hyper-parameter tuning.

Knowledge Distillation (KD) transfers knowledge from a large teacher model to a smaller student by aligning their predictive distributions. However, conventional KD formulations - typically based on Kullback-Leibler divergence - assume that the teacher provides reliable soft targets. In practice, teacher predictions are often noisy or overconfident, and existing correction-based approaches rely on ad-hoc heuristics and extensive hyper-parameter tuning, which hinders generalization. We introduce REDistill (Robust Estimator Distillation), a simple yet principled framework grounded in robust statistics. REDistill replaces the standard KD objective with a power divergence loss, a generalization of KL divergence that adaptively downweights unreliable teacher output while preserving informative logit relationships. This formulation provides a unified and interpretable treatment of teacher noise, requires only logits, integrates seamlessly into existing KD pipelines, and incurs negligible computational overhead. Extensive experiments on CIFAR-100 and ImageNet-1k demonstrate that REDistill consistently improves student accuracy in diverse teacher-student architectures. Remarkably, it achieves these gains without model-specific hyper-parameter tuning, underscoring its robustness and strong generalization to unseen teacher-student pairs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes