IV CVJan 23, 2025

On Disentangled Training for Nonlinear Transform in Learned Image Compression

Han Li, Shaohui Li, Wenrui Dai, Maida Cao, Nuowen Kan, Chenglin Li, Junni Zou, Hongkai Xiong

arXiv:2501.13751v319.313 citationsh-index: 41Has CodeICLR

Originality Incremental advance

AI Analysis

This work addresses training inefficiency for researchers and practitioners in learned image compression, offering a method to reduce training times from weeks to days, though it is incremental as it builds on existing nonlinear transform frameworks.

The paper tackles the slow training inefficiency in learned image compression by revealing that energy compaction in nonlinear transforms consists of feature decorrelation and uneven energy modulation, and proposes a linear auxiliary transform with wavelet-based shortcuts to disentangle this process, achieving training time reductions of up to 50% while maintaining competitive rate-distortion performance.

Learned image compression (LIC) has demonstrated superior rate-distortion (R-D) performance compared to traditional codecs, but is challenged by training inefficiency that could incur more than two weeks to train a state-of-the-art model from scratch. Existing LIC methods overlook the slow convergence caused by compacting energy in learning nonlinear transforms. In this paper, we first reveal that such energy compaction consists of two components, i.e., feature decorrelation and uneven energy modulation. On such basis, we propose a linear auxiliary transform (AuxT) to disentangle energy compaction in training nonlinear transforms. The proposed AuxT obtains coarse approximation to achieve efficient energy compaction such that distribution fitting with the nonlinear transforms can be simplified to fine details. We then develop wavelet-based linear shortcuts (WLSs) for AuxT that leverages wavelet-based downsampling and orthogonal linear projection for feature decorrelation and subband-aware scaling for

View on arXiv PDF Code

Similar