IV CVApr 26, 2023

Multi-Modality Deep Network for Extreme Learned Image Compression

Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen

arXiv:2304.13583v118.631 citationsh-index: 36

Originality Incremental advance

AI Analysis

This addresses the issue of poor image quality in compression for applications requiring low bandwidth, though it is incremental as it builds on existing multimodal approaches.

The paper tackles the problem of severe semantics loss in learned image compression at extremely low bitrates by proposing a multimodal method that uses text descriptions as prior information to guide compression, achieving visually pleasing results comparable to state-of-the-art methods at 2x to 4x higher bitrates.

Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior information to guide image compression for better compression performance. We fully study the role of text description in different components of the codec, and demonstrate its effectiveness. In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions. Extensive experiments, including a user study, prove that our method can obtain visually pleasing results at extremely low bitrates, and achieves a comparable or even better performance than state-of-the-art methods, even though these methods are at 2x to 4x bitrates of ours.

View on arXiv PDF

Similar