On Disentangled Training for Nonlinear Transform in Learned Image Compression
This work addresses training inefficiency for researchers and practitioners in learned image compression, offering a method to reduce training times from weeks to days, though it is incremental as it builds on existing nonlinear transform frameworks.
The paper tackles the slow training inefficiency in learned image compression by revealing that energy compaction in nonlinear transforms consists of feature decorrelation and uneven energy modulation, and proposes a linear auxiliary transform with wavelet-based shortcuts to disentangle this process, achieving training time reductions of up to 50% while maintaining competitive rate-distortion performance.
Learned image compression (LIC) has demonstrated superior rate-distortion (R-D) performance compared to traditional codecs, but is challenged by training inefficiency that could incur more than two weeks to train a state-of-the-art model from scratch. Existing LIC methods overlook the slow convergence caused by compacting energy in learning nonlinear transforms. In this paper, we first reveal that such energy compaction consists of two components, i.e., feature decorrelation and uneven energy modulation. On such basis, we propose a linear auxiliary transform (AuxT) to disentangle energy compaction in training nonlinear transforms. The proposed AuxT obtains coarse approximation to achieve efficient energy compaction such that distribution fitting with the nonlinear transforms can be simplified to fine details. We then develop wavelet-based linear shortcuts (WLSs) for AuxT that leverages wavelet-based downsampling and orthogonal linear projection for feature decorrelation and subband-aware scaling for