CVNov 21, 2025

A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs

arXiv:2511.17135v13.6

Originality Highly original

AI Analysis

This addresses the problem of efficient hardware deployment for learned image compression, which is incremental as it builds on existing methods but introduces tailored optimizations for FPGAs.

This work tackled the challenge of deploying deep learning-based image compression models on resource-constrained FPGAs by developing a multi-stage optimization framework, resulting in a final model that reduces computational complexity by over 20% with negligible impact on performance and achieves a BD-rate overhead of 6.3%.

Deep learning-based image compression (LIC) has achieved state-of-the-art rate-distortion (RD) performance, yet deploying these models on resource-constrained FPGAs remains a major challenge. This work presents a complete, multi-stage optimization framework to bridge the gap between high-performance floating-point models and efficient, hardware-friendly integer-based implementations. First, we address the fundamental problem of quantization-induced performance degradation. We propose a Dynamic Range-Aware Quantization (DRAQ) method that uses statistically-calibrated activation clipping and a novel weight regularization scheme to counteract the effects of extreme data outliers and large dynamic ranges, successfully creating a high-fidelity 8-bit integer model. Second, building on this robust foundation, we introduce two hardware-aware optimization techniques tailored for FPGAs. A progressive mixed-precision search algorithm exploits FPGA flexibility to assign optimal, non-uniform bit-widths to each layer, minimizing complexity while preserving performance. Concurrently, a channel pruning method, adapted to work with the Generalized Divisive Normalization (GDN) layers common in LIC, removes model redundancy by eliminating inactive channels. Our comprehensive experiments show that the foundational DRAQ method reduces the BD-rate overhead of a GDN-based model from $30\%$ to $6.3\%$. The subsequent hardware-aware optimizations further reduce computational complexity by over $20\%$ with negligible impact on RD performance, yielding a final model that is both state-of-the-art in efficiency and superior in quality to existing FPGA-based LIC implementations.

View on arXiv PDF

Similar