CV HC LGJul 7, 2021

Pragmatic Image Compression for Human-in-the-Loop Decision-Making

Siddharth Reddy, Anca D. Dragan, Sergey Levine

arXiv:2108.04219v16.514 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient image transmission in interactive applications like e-commerce, though it is incremental as it builds on existing compression and human-in-the-loop methods.

The paper tackles the problem of reducing image bitrates for human-in-the-loop tasks by training a compression model to preserve only features that drive user decisions, achieving lower bitrates than baselines while matching user actions across tasks like digit reading and online shopping.

Standard lossy image compression algorithms aim to preserve an image's appearance, while minimizing the number of bits needed to transmit it. However, the amount of information actually needed by a user for downstream tasks -- e.g., deciding which product to click on in a shopping website -- is likely much lower. To achieve this lower bitrate, we would ideally only transmit the visual features that drive user behavior, while discarding details irrelevant to the user's decisions. We approach this problem by training a compression model through human-in-the-loop learning as the user performs tasks with the compressed images. The key insight is to train the model to produce a compressed image that induces the user to take the same action that they would have taken had they seen the original image. To approximate the loss function for this model, we train a discriminator that tries to distinguish whether a user's action was taken in response to the compressed image or the original. We evaluate our method through experiments with human participants on four tasks: reading handwritten digits, verifying photos of faces, browsing an online shopping catalogue, and playing a car racing video game. The results show that our method learns to match the user's actions with and without compression at lower bitrates than baseline methods, and adapts the compression model to the user's behavior: it preserves the digit number and randomizes handwriting style in the digit reading task, preserves hats and eyeglasses while randomizing faces in the photo verification task, preserves the perceived price of an item while randomizing its color and background in the online shopping task, and preserves upcoming bends in the road in the car racing game.

View on arXiv PDF Code

Similar