Enhancing Learned Image Compression via Cross Window-based Attention
This work addresses a bottleneck in learned image compression for applications requiring efficient storage and transmission, but it is incremental as it builds on existing CNN and attention mechanisms.
The paper tackles the problem of capturing local redundancy in learned image compression by proposing a CNN-based solution with a feature encoding module and cross-scale window-based attention, achieving performance on par with state-of-the-art methods on Kodak and CLIC datasets.
In recent years, learned image compression methods have demonstrated superior rate-distortion performance compared to traditional image compression methods. Recent methods utilize convolutional neural networks (CNN), variational autoencoders (VAE), invertible neural networks (INN), and transformers. Despite their significant contributions, a main drawback of these models is their poor performance in capturing local redundancy. Therefore, to leverage global features along with local redundancy, we propose a CNN-based solution integrated with a feature encoding module. The feature encoding module encodes important features before feeding them to the CNN and then utilizes cross-scale window-based attention, which further captures local redundancy. Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field. Both the feature encoding module and the cross-scale window-based attention module in our architecture are flexible and can be incorporated into any other network architecture. We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods. Our code is publicly available at https://github.com/prmudgal/CWAM_IC_ISVC. .