IVCVAug 4, 2022

Scalable Video Coding for Humans and Machines

arXiv:2208.02512v119 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the need for efficient video coding in applications like surveillance and content moderation where both humans and machines analyze video, representing an incremental improvement by combining conventional and DNN-based methods.

The paper tackles the problem of video coding for both human and machine vision by proposing a scalable framework that supports object detection in the base layer and human viewing in the enhancement layer, achieving 13-19% bit savings on object detection compared to state-of-the-art codecs while remaining competitive in MS-SSIM for human vision.

Video content is watched not only by humans, but increasingly also by machines. For example, machine learning models analyze surveillance video for security and traffic monitoring, search through YouTube videos for inappropriate content, and so on. In this paper, we propose a scalable video coding framework that supports machine vision (specifically, object detection) through its base layer bitstream and human vision via its enhancement layer bitstream. The proposed framework includes components from both conventional and Deep Neural Network (DNN)-based video coding. The results show that on object detection, the proposed framework achieves 13-19% bit savings compared to state-of-the-art video codecs, while remaining competitive in terms of MS-SSIM on the human vision task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes