Donghui Feng

CV
h-index30
4papers
72citations
Novelty55%
AI Score41

4 Papers

CVFeb 26, 2024Code
MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Chunyi Li, Guo Lu, Donghui Feng et al.

With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on https://github.com/lcysyzxdxc/MISC.

CVFeb 9, 2025Code
Linear Attention Modeling for Learned Image Compression

Donghui Feng, Zhengxue Cheng, Shen Wang et al.

Recent years, learned image compression has made tremendous progress to achieve impressive coding efficiency. Its coding gain mainly comes from non-linear neural network-based transform and learnable entropy modeling. However, most studies focus on a strong backbone, and few studies consider a low complexity design. In this paper, we propose LALIC, a linear attention modeling for learned image compression. Specially, we propose to use Bi-RWKV blocks, by utilizing the Spatial Mix and Channel Mix modules to achieve more compact feature extraction, and apply the Conv based Omni-Shift module to adapt to two-dimensional latent representation. Furthermore, we propose a RWKV-based Spatial-Channel ConTeXt model (RWKV-SCCTX), that leverages the Bi-RWKV to modeling the correlation between neighboring features effectively. To our knowledge, our work is the first work to utilize efficient Bi-RWKV models with linear attention for learned image compression. Experimental results demonstrate that our method achieves competitive RD performances by outperforming VTM-9.1 by -15.26%, -15.41%, -17.63% in BD-rate on Kodak, CLIC and Tecnick datasets. The code is available at https://github.com/sjtu-medialab/RwkvCompress .

CVFeb 2, 2025Code
S2CFormer: Revisiting the RD-Latency Trade-off in Transformer-based Learned Image Compression

Yunuo Chen, Qian Li, Bing He et al.

Transformer-based Learned Image Compression (LIC) suffers from a suboptimal trade-off between decoding latency and rate-distortion (R-D) performance. Moreover, the critical role of the FeedForward Network (FFN)-based channel aggregation module has been largely overlooked. Our research reveals that efficient channel aggregation-rather than complex and time-consuming spatial operations-is the key to achieving competitive LIC models. Based on this insight, we initiate the ``S2CFormer'' paradigm, a general architecture that simplifies spatial operations and enhances channel operations to overcome the previous trade-off. We present two instances of the S2CFormer: S2C-Conv, and S2C-Attention. Both models demonstrate state-of-the-art (SOTA) R-D performance and significantly faster decoding speed. Furthermore, we introduce S2C-Hybrid, an enhanced variant that maximizes the strengths of different S2CFormer instances to achieve a better performance-latency trade-off. This model outperforms all the existing methods on the Kodak, Tecnick, and CLIC Professional Validation datasets, setting a new benchmark for efficient and high-performance LIC. The code is at \href{https://github.com/YunuoChen/S2CFormer}{https://github.com/YunuoChen/S2CFormer}.

CVJun 13, 2024
CMC-Bench: Towards a New Paradigm of Visual Signal Compression

Chunyi Li, Xiele Wu, Haoning Wu et al.

Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.