IV CVAug 11, 2025

DiffVC-OSD: One-Step Diffusion-based Perceptual Neural Video Compression Framework

arXiv:2508.07682v116.75 citationsh-index: 2VCIP

Originality Highly original

AI Analysis

This work addresses the decoding speed bottleneck for diffusion-based video compression, offering a significant improvement for real-time applications.

The paper tackles the problem of slow decoding in diffusion-based neural video compression by proposing DiffVC-OSD, a one-step diffusion framework that enhances perceptual quality through a single step, achieving about 20× faster decoding and an 86.92% bitrate reduction compared to multi-step variants.

In this work, we first propose DiffVC-OSD, a One-Step Diffusion-based Perceptual Neural Video Compression framework. Unlike conventional multi-step diffusion-based methods, DiffVC-OSD feeds the reconstructed latent representation directly into a One-Step Diffusion Model, enhancing perceptual quality through a single diffusion step guided by both temporal context and the latent itself. To better leverage temporal dependencies, we design a Temporal Context Adapter that encodes conditional inputs into multi-level features, offering more fine-grained guidance for the Denoising Unet. Additionally, we employ an End-to-End Finetuning strategy to improve overall compression performance. Extensive experiments demonstrate that DiffVC-OSD achieves state-of-the-art perceptual compression performance, offers about 20$\times$ faster decoding and a 86.92\% bitrate reduction compared to the corresponding multi-step diffusion-based variant.

View on arXiv PDF

Similar