CVJul 24, 2021

Self-Conditioned Probabilistic Learning of Video Rescaling

arXiv:2107.11639v239 citations
Originality Incremental advance
AI Analysis

This work addresses video storage and processing efficiency issues for multimedia applications, representing an incremental improvement over existing methods.

The paper tackles the problem of video rescaling by proposing a self-conditioned probabilistic framework that simultaneously learns downscaling and upscaling procedures, resulting in downscaled videos that preserve more meaningful information and improve performance in upscaling and downstream tasks like action recognition.

Bicubic downscaling is a prevalent technique used to reduce the video storage burden or to accelerate the downstream processing speed. However, the inverse upscaling step is non-trivial, and the downscaled video may also deteriorate the performance of downstream tasks. In this paper, we propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously. During the training, we decrease the entropy of the information lost in the downscaling by maximizing its probability conditioned on the strong spatial-temporal prior information within the downscaled video. After optimization, the downscaled video by our framework preserves more meaningful information, which is beneficial for both the upscaling step and the downstream tasks, e.g., video action recognition task. We further extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed for the end-to-end training of the whole system. Extensive experimental results demonstrate the superiority of our approach on video rescaling, video compression, and efficient action recognition tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes