CVJan 28

DiffVC-RT: Towards Practical Real-Time Diffusion-based Perceptual Neural Video Compression

arXiv:2601.20564v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses practical deployment issues in video compression for applications requiring real-time performance, representing a significant milestone but with incremental improvements in efficiency and consistency.

The paper tackled the challenges of diffusion-based neural video compression by proposing DiffVC-RT, which achieved 80.1% bitrate savings over VTM-17.0 on the HEVC dataset with real-time encoding and decoding speeds of 206/30 fps for 720p videos.

The practical deployment of diffusion-based Neural Video Compression (NVC) faces critical challenges, including severe information loss, prohibitive inference latency, and poor temporal consistency. To bridge this gap, we propose DiffVC-RT, the first framework designed to achieve real-time diffusion-based perceptual NVC. First, we introduce an Efficient and Informative Model Architecture. Through strategic module replacements and pruning, this architecture significantly reduces computational complexity while mitigating structural information loss. Second, to address generative flickering artifacts, we propose Explicit and Implicit Consistency Modeling. We enhance temporal consistency by explicitly incorporating a zero-cost Online Temporal Shift Module within the U-Net, complemented by hybrid implicit consistency constraints. Finally, we present an Asynchronous and Parallel Decoding Pipeline incorporating Mixed Half Precision, which enables asynchronous latent decoding and parallel frame reconstruction via a Batch-dimension Temporal Shift design. Experiments show that DiffVC-RT achieves 80.1% bitrate savings in terms of LPIPS over VTM-17.0 on HEVC dataset with real-time encoding and decoding speeds of 206 / 30 fps for 720p videos on an NVIDIA H800 GPU, marking a significant milestone in diffusion-based video compression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes