CVJul 29, 2025

Cross-Architecture Distillation Made Simple with Redundancy Suppression

arXiv:2507.21844v14 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the need for efficient and simple distillation methods in machine learning, though it appears incremental as it builds on existing distillation techniques.

The paper tackles the problem of cross-architecture knowledge distillation by proposing a redundancy suppression method to extract architecture-agnostic knowledge, resulting in outperforming the pioneering OFA method on CIFAR-100 and ImageNet-1k benchmarks with reduced parameter overhead.

We describe a simple method for cross-architecture knowledge distillation, where the knowledge transfer is cast into a redundant information suppression formulation. Existing methods introduce sophisticated modules, architecture-tailored designs, and excessive parameters, which impair their efficiency and applicability. We propose to extract the architecture-agnostic knowledge in heterogeneous representations by reducing the redundant architecture-exclusive information. To this end, we present a simple redundancy suppression distillation (RSD) loss, which comprises cross-architecture invariance maximisation and feature decorrelation objectives. To prevent the student from entirely losing its architecture-specific capabilities, we further design a lightweight module that decouples the RSD objective from the student's internal representations. Our method is devoid of the architecture-specific designs and complex operations in the pioneering method of OFA. It outperforms OFA on CIFAR-100 and ImageNet-1k benchmarks with only a fraction of their parameter overhead, which highlights its potential as a simple and strong baseline to the cross-architecture distillation community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes