Adaptive Transform Coding for Semantic Compression
It addresses the need for efficient compression of machine-oriented visual representations, offering a flexible and interpretable alternative to neural compression for downstream inference tasks.
The paper proposes an adaptive transform-coding method for compressing semantic features from vision models, achieving competitive or better rate-distortion performance compared to neural compression methods while maintaining interpretability.
Visual data compression is shifting from human-centered reconstruction to machine-oriented representation coding. In this setting, an image is often mapped to a compact semantic embedding, which is then compressed and transmitted for downstream inference. We propose an adaptive transform-coding method for semantic-feature compression motivated by the conditional rate-distortion function of a Gaussian mixture model. The scheme uses mode-dependent transforms and quantizers selected according to the inferred source component, enabling more efficient coding of heterogeneous feature distributions. Evaluations on features from widely used vision backbones and foundation models show that the proposed method outperforms or is competitive with state-of-the-art neural compression methods while preserving flexibility and interpretability.