CVAug 22, 2025

Expandable Residual Approximation for Knowledge Distillation

arXiv:2508.16050v1h-index: 3Has CodeIEEE Trans Neural Netw Learn Syst
Originality Incremental advance
AI Analysis

This addresses the capacity gap problem in knowledge distillation for computer vision, offering incremental improvements in model efficiency and performance.

The paper tackles the challenge of knowledge distillation by proposing Expandable Residual Approximation (ERA), a method that improves Top-1 accuracy on ImageNet by 1.41% and AP on MS COCO by 1.40.

Knowledge distillation (KD) aims to transfer knowledge from a large-scale teacher model to a lightweight one, significantly reducing computational and storage requirements. However, the inherent learning capacity gap between the teacher and student often hinders the sufficient transfer of knowledge, motivating numerous studies to address this challenge. Inspired by the progressive approximation principle in the Stone-Weierstrass theorem, we propose Expandable Residual Approximation (ERA), a novel KD method that decomposes the approximation of residual knowledge into multiple steps, reducing the difficulty of mimicking the teacher's representation through a divide-and-conquer approach. Specifically, ERA employs a Multi-Branched Residual Network (MBRNet) to implement this residual knowledge decomposition. Additionally, a Teacher Weight Integration (TWI) strategy is introduced to mitigate the capacity disparity by reusing the teacher's head weights. Extensive experiments show that ERA improves the Top-1 accuracy on the ImageNet classification benchmark by 1.41% and the AP on the MS COCO object detection benchmark by 1.40, as well as achieving leading performance across computer vision tasks. Codes and models are available at https://github.com/Zhaoyi-Yan/ERA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes