CVIVMar 11, 2022

Saliency-Driven Versatile Video Coding for Neural Object Detection

arXiv:2203.05944v128 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses efficient video compression for machine vision tasks, offering a domain-specific improvement for applications like surveillance or autonomous systems.

The paper tackles the problem of video coding for machines by proposing a saliency-driven framework using VVC, achieving up to 29% bitrate savings with the same object detection accuracy compared to constant quality encoding.

Saliency-driven image and video coding for humans has gained importance in the recent past. In this paper, we propose such a saliency-driven coding framework for the video coding for machines task using the latest video coding standard Versatile Video Coding (VVC). To determine the salient regions before encoding, we employ the real-time-capable object detection network You Only Look Once~(YOLO) in combination with a novel decision criterion. To measure the coding quality for a machine, the state-of-the-art object segmentation network Mask R-CNN was applied to the decoded frame. From extensive simulations we find that, compared to the reference VVC with a constant quality, up to 29 % of bitrate can be saved with the same detection accuracy at the decoder side by applying the proposed saliency-driven framework. Besides, we compare YOLO against other, more traditional saliency detection methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes