IVCVJun 19, 2023

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision

arXiv:2306.10681v230 citationsh-index: 94
Originality Incremental advance
AI Analysis

This addresses the problem of video processing inefficiency for both human viewers and machine vision systems, representing an incremental improvement by integrating feature-based compression with direct task support.

The paper tackles the inefficiency of decoding videos to pixels for machine vision tasks by proposing VNVC, a neural video coding framework that learns compact representations supporting both reconstruction and direct enhancement/analysis, achieving high compression efficiency and satisfactory task performances with lower complexities.

Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis, thereby being versatile for both human and machine vision. Our VNVC framework has a feature-based compression loop. In the loop, one frame is encoded into compact representations and decoded to an intermediate feature that is obtained before performing reconstruction. The intermediate feature can be used as reference in motion compensation and motion estimation through feature-based temporal context mining and cross-domain motion encoder-decoder to compress the following frames. The intermediate feature is directly fed into video reconstruction, video enhancement, and video analysis networks to evaluate its effectiveness. The evaluation shows that our framework with the intermediate feature achieves high compression efficiency for video reconstruction and satisfactory task performances with lower complexities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes