Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference
This addresses the need for efficient and effective semantic inference in compressed domains, particularly for vision tasks, with an incremental improvement over existing methods.
The paper tackles the problem of limited semantic richness in compressed domain semantic inference by introducing Perception-Oriented Latent Coding (POLC), which achieves rate-perception performance comparable to state-of-the-art methods and enhances vision task performance with minimal fine-tuning overhead.
In recent years, compressed domain semantic inference has primarily relied on learned image coding models optimized for mean squared error (MSE). However, MSE-oriented optimization tends to yield latent spaces with limited semantic richness, which hinders effective semantic inference in downstream tasks. Moreover, achieving high performance with these models often requires fine-tuning the entire vision model, which is computationally intensive, especially for large models. To address these problems, we introduce Perception-Oriented Latent Coding (POLC), an approach that enriches the semantic content of latent features for high-performance compressed domain semantic inference. With the semantically rich latent space, POLC requires only a plug-and-play adapter for fine-tuning, significantly reducing the parameter count compared to previous MSE-oriented methods. Experimental results demonstrate that POLC achieves rate-perception performance comparable to state-of-the-art generative image coding methods while markedly enhancing performance in vision tasks, with minimal fine-tuning overhead. Code is available at https://github.com/NJUVISION/POLC.