CV AIFeb 13, 2024

Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss

Kei Iino, Shunsuke Akamatsu, Hiroshi Watanabe, Shohei Enomoto, Akira Sakamoto, Takeharu Eda

arXiv:2402.08267v22.0h-index: 7ICIP

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient image compression for machine analysis, offering a solution that avoids the difficulties of deep recognition model optimization and extra overhead in existing methods, though it is incremental in the context of learned image coding for machines.

The paper tackles the problem of improving image coding for machines by proposing a novel training method that uses auxiliary loss to enhance encoder recognition capability and rate-distortion performance, achieving Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks compared to conventional methods.

Image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM; optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method. \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

View on arXiv PDF

Similar