CVNov 23, 2023

Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning

arXiv:2311.13934v29 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses performance issues in efficient student models for computer vision tasks, offering an incremental improvement over existing knowledge distillation techniques.

The paper tackled the limitations of KL divergence in knowledge distillation by proposing Robustness-Reinforced Knowledge Distillation (R2KD) with correlation distance and network pruning, achieving superior performance over state-of-the-art methods on datasets like CIFAR-100 and ImageNet.

The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes