A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate Representations
For video compression researchers, this work addresses the bottleneck of decoding ultra-low-bitrate videos, offering a method that balances fidelity and perceptual quality.
The paper tackles video reconstruction from ultra-low-bitrate representations, where encoding is trivial but decoding is hard. The proposed causal diffusion model outperforms classical, neural, generative, and semantic baselines in fidelity, temporal consistency, and perceptual quality.
We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often struggle to jointly preserve fidelity, temporal consistency, and perceptual quality. To address these limitations, we propose a causal video diffusion model that reconstructs videos from ultra-low-bitrate semantics and highly compressed frames by jointly modeling their complementary information. We further introduce temporal-only distillation from a bidirectional teacher to enable parameter-efficient training and causal few-step inference. Through extensive quantitative, qualitative, and subjective evaluation, we show that the proposed method outperforms classical, neural, generative, and semantic baselines in ultra-low-bitrate video reconstruction.