CV MLMar 9, 2025

FEDS: Feature and Entropy-Based Distillation Strategy for Efficient Learned Image Compression

Haisheng Fu, Jie Liang, Zhenman Fang, Jingning Han

arXiv:2503.06399v2h-index: 11

Originality Incremental advance

AI Analysis

This work addresses the practical adoption barriers of learned image compression for real-time or resource-limited scenarios, though it is incremental as it builds on existing distillation and compression methods.

The paper tackles the problem of high computational cost and large model size in learned image compression by proposing a feature and entropy-based distillation strategy (FEDS) that transfers knowledge from a high-capacity teacher to a lightweight student model, resulting in a student model that nearly matches the teacher's performance with only a 1.24% BD-Rate increase on Kodak while reducing parameters by 63% and speeding up encoding/decoding by 73%.

Learned image compression (LIC) methods have recently outperformed traditional codecs such as VVC in rate-distortion performance. However, their large models and high computational costs have limited their practical adoption. In this paper, we first construct a high-capacity teacher model by integrating Swin-Transformer V2-based attention modules, additional residual blocks, and expanded latent channels, thus achieving enhanced compression performance. Building on this foundation, we propose a \underline{F}eature and \underline{E}ntropy-based \underline{D}istillation \underline{S}trategy (\textbf{FEDS}) that transfers key knowledge from the teacher to a lightweight student model. Specifically, we align intermediate feature representations and emphasize the most informative latent channels through an entropy-based loss. A staged training scheme refines this transfer in three phases: feature alignment, channel-level distillation, and final fine-tuning. Our student model nearly matches the teacher across Kodak (1.24\% BD-Rate increase), Tecnick (1.17\%), and CLIC (0.55\%) while cutting parameters by about 63\% and accelerating encoding/decoding by around 73\%. Moreover, ablation studies indicate that FEDS generalizes effectively to transformer-based networks. The experimental results demonstrate our approach strikes a compelling balance among compression performance, speed, and model parameters, making it well-suited for real-time or resource-limited scenarios.

View on arXiv PDF

Similar