CVFeb 10, 2025

Multi-Level Decoupled Relational Distillation for Heterogeneous Architectures

arXiv:2502.06189v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the challenge of effectively transferring knowledge across different neural network architectures for researchers and practitioners in machine learning, representing an incremental improvement.

The paper tackled the problem of limited performance in heterogeneous knowledge distillation by proposing a novel framework, MLDR-KD, which improved student model performance with gains of up to 4.86% on CIFAR-100 and 2.78% on Tiny-ImageNet compared to the best available method.

Heterogeneous distillation is an effective way to transfer knowledge from cross-architecture teacher models to student models. However, existing heterogeneous distillation methods do not take full advantage of the dark knowledge hidden in the teacher's output, limiting their performance.To this end, we propose a novel framework named Multi-Level Decoupled Relational Knowledge Distillation (MLDR-KD) to unleash the potential of relational distillation in heterogeneous distillation. Concretely, we first introduce Decoupled Finegrained Relation Alignment (DFRA) in both logit and feature levels to balance the trade-off between distilled dark knowledge and the confidence in the correct category of the heterogeneous teacher model. Then, Multi-Scale Dynamic Fusion (MSDF) module is applied to dynamically fuse the projected logits of multiscale features at different stages in student model, further improving performance of our method in feature level. We verify our method on four architectures (CNNs, Transformers, MLPs and Mambas), two datasets (CIFAR-100 and Tiny-ImageNet). Compared with the best available method, our MLDR-KD improves student model performance with gains of up to 4.86% on CIFAR-100 and 2.78% on Tiny-ImageNet datasets respectively, showing robustness and generality in heterogeneous distillation. Code will be released soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes