Controllable Generative Video Compression
This work addresses the challenge of balancing perception and fidelity in video compression, offering a solution for applications requiring high-quality video reproduction, though it is incremental in advancing generative compression techniques.
The paper tackles the trade-off between perceptual realism and signal fidelity in generative video compression by proposing a controllable paradigm that uses keyframes and dense per-frame priors to guide non-keyframe generation, achieving improvements in both fidelity and perceptual quality over previous methods.
Perceptual video compression adopts generative video modeling to improve perceptual realism but frequently sacrifices signal fidelity, diverging from the goal of video compression to faithfully reproduce visual signal. To alleviate the dilemma between perception and fidelity, in this paper we propose Controllable Generative Video Compression (CGVC) paradigm to faithfully generate details guided by multiple visual conditions. Under the paradigm, representative keyframes of the scene are coded and used to provide structural priors for non-keyframe generation. Dense per-frame control prior is additionally coded to better preserve finer structure and semantics of each non-keyframe. Guided by these priors, non-keyframes are reconstructed by controllable video generation model with temporal and content consistency. Furthermore, to accurately recover color information of the video, we develop a color-distance-guided keyframe selection algorithm to adaptively choose keyframes. Experimental results show CGVC outperforms previous perceptual video compression method in terms of both signal fidelity and perceptual quality.