IVCVMMDec 6, 2024

SMIC: Semantic Multi-Item Compression based on CLIP dictionary

arXiv:2412.05035v1h-index: 18
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient compression for image collections, offering a domain-specific improvement over existing generative codecs.

The paper tackles image collection compression by leveraging CLIP's latent space to account for inter-item redundancy, achieving a compression rate of around 10^-5 BPP per image without sacrificing semantic fidelity.

Semantic compression, a compression scheme where the distortion metric, typically MSE, is replaced with semantic fidelity metrics, tends to become more and more popular. Most recent semantic compression schemes rely on the foundation model CLIP. In this work, we extend such a scheme to image collection compression, where inter-item redundancy is taken into account during the coding phase. For that purpose, we first show that CLIP's latent space allows for easy semantic additions and subtractions. From this property, we define a dictionary-based multi-item codec that outperforms state-of-the-art generative codec in terms of compression rate, around $10^{-5}$ BPP per image, while not sacrificing semantic fidelity. We also show that the learned dictionary is of a semantic nature and works as a semantic projector for the semantic content of images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes