CV AI IVFeb 26, 2024

MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang

arXiv:2402.16749v321.842 citationsh-index: 30Has CodeIEEE Transactions on Image Processing

Originality Incremental advance

AI Analysis

This addresses the need for efficient image compression in storage and communication, with potential applications in next-generation systems, though it appears incremental as it builds on existing LMM advancements.

The paper tackles the problem of ultra-low bitrate image compression, where existing methods sacrifice consistency or perceptual quality, by proposing MISC, a method using a Large Multimodal Model to balance these goals, achieving optimal consistency and perception while saving 50% bitrate.

With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on https://github.com/lcysyzxdxc/MISC.

View on arXiv PDF Code

Similar