IV CVApr 3, 2025

Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization

Samuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega

arXiv:2504.02216v28.63 citationsh-index: 12IEEE transactions on multimedia

Originality Incremental advance

AI Analysis

This addresses the need for efficient image compression in distributed applications where images are primarily processed by algorithms, with incremental improvements over existing coding methods.

The paper tackles the problem of compressing images for machine vision tasks by proposing a rate-distortion optimization method using feature distances, achieving up to 17% bit-rate savings for the same task accuracy compared to traditional methods.

Many images and videos are primarily processed by computer vision algorithms, involving only occasional human inspection. When this content requires compression before processing, e.g., in distributed applications, coding methods must optimize for both visual quality and downstream task performance. We first show theoretically that an approach to reduce the effect of compression for a given task loss is to perform rate-distortion optimization (RDO) using the distance between features, obtained from the original and the decoded images, as a distortion metric. However, optimizing directly such a rate-distortion objective is computationally impractical because it requires iteratively encoding and decoding the entire image-plus feature evaluation-for each possible coding configuration. We address this problem by simplifying the RDO formulation to make the distortion term computable using block-based encoders. We first apply Taylor's expansion to the feature extractor, recasting the feature distance as a quadratic metric involving the Jacobian matrix of the neural network. Then, we replace the linearized metric with a block-wise approximation, which we call input-dependent squared error (IDSE). To make the metric computable, we approximate IDSE using sketches of the Jacobian. The resulting loss can be evaluated block-wise in the transform domain and combined with the sum of squared errors (SSE) to address both visual quality and computer vision performance. Simulations with AVC and HEVC across multiple feature extractors and downstream networks show up to 17 % bit-rate savings for the same task accuracy compared to RDO based on SSE, with no decoder complexity overhead and a small (7.86 %) encoder complexity increase.

View on arXiv PDF

Similar