IVCVDec 17, 2024

Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression

arXiv:2412.12982v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the need for efficient compression of AI-generated images, which is an incremental improvement in a domain-specific area with limited prior research.

The paper tackles the problem of compressing AI-generated images by introducing a scalable cross-modal framework that encodes images into layered bitstreams (semantic, structural, texture) and uses Stable Diffusion as a decoder, achieving competitive restoration at extremely low bitrates (<0.02 bpp).

Recent advances in Artificial Intelligence Generated Content (AIGC) have garnered significant interest, accompanied by an increasing need to transmit and compress the vast number of AI-generated images (AIGIs). However, there is a noticeable deficiency in research focused on compression methods for AIGIs. To address this critical gap, we introduce a scalable cross-modal compression framework that incorporates multiple human-comprehensible modalities, designed to efficiently capture and relay essential visual information for AIGIs. In particular, our framework encodes images into a layered bitstream consisting of a semantic layer that delivers high-level semantic information through text prompts; a structural layer that captures spatial details using edge or skeleton maps; and a texture layer that preserves local textures via a colormap. Utilizing Stable Diffusion as the backend, the framework effectively leverages these multimodal priors for image generation, effectively functioning as a decoder when these priors are encoded. Qualitative and quantitative results show that our method proficiently restores both semantic and visual details, competing against baseline approaches at extremely low bitrates ( <0.02 bpp). Additionally, our framework facilitates downstream editing applications without requiring full decoding, thereby paving a new direction for future research in AIGI compression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes