IVAIGRFeb 5, 2025

DC-VSR: Spatially and Temporally Consistent Video Super-Resolution with Video Diffusion Prior

arXiv:2502.03502v23 citationsh-index: 7SIGGRAPH
Originality Incremental advance
AI Analysis

This work improves video super-resolution for applications like video enhancement, though it is incremental as it builds on existing diffusion-based methods.

The paper tackles the problem of video super-resolution by addressing spatio-temporal inconsistencies in diffusion-based methods, proposing DC-VSR with novel attention propagation schemes and guidance to achieve high-quality results, outperforming previous approaches.

Video super-resolution (VSR) aims to reconstruct a high-resolution (HR) video from a low-resolution (LR) counterpart. Achieving successful VSR requires producing realistic HR details and ensuring both spatial and temporal consistency. To restore realistic details, diffusion-based VSR approaches have recently been proposed. However, the inherent randomness of diffusion, combined with their tile-based approach, often leads to spatio-temporal inconsistencies. In this paper, we propose DC-VSR, a novel VSR approach to produce spatially and temporally consistent VSR results with realistic textures. To achieve spatial and temporal consistency, DC-VSR adopts a novel Spatial Attention Propagation (SAP) scheme and a Temporal Attention Propagation (TAP) scheme that propagate information across spatio-temporal tiles based on the self-attention mechanism. To enhance high-frequency details, we also introduce Detail-Suppression Self-Attention Guidance (DSSAG), a novel diffusion guidance scheme. Comprehensive experiments demonstrate that DC-VSR achieves spatially and temporally consistent, high-quality VSR results, outperforming previous approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes