IV AI GRFeb 5, 2025

DC-VSR: Spatially and Temporally Consistent Video Super-Resolution with Video Diffusion Prior

Janghyeok Han, Gyujin Sim, Geonung Kim, Hyun-seung Lee, Kyuha Choi, Youngseok Han, Sunghyun Cho

arXiv:2502.03502v211.33 citationsh-index: 7SIGGRAPH

Originality Incremental advance

AI Analysis

This work improves video super-resolution for applications like video enhancement, though it is incremental as it builds on existing diffusion-based methods.

The paper tackles the problem of video super-resolution by addressing spatio-temporal inconsistencies in diffusion-based methods, proposing DC-VSR with novel attention propagation schemes and guidance to achieve high-quality results, outperforming previous approaches.

Video super-resolution (VSR) aims to reconstruct a high-resolution (HR) video from a low-resolution (LR) counterpart. Achieving successful VSR requires producing realistic HR details and ensuring both spatial and temporal consistency. To restore realistic details, diffusion-based VSR approaches have recently been proposed. However, the inherent randomness of diffusion, combined with their tile-based approach, often leads to spatio-temporal inconsistencies. In this paper, we propose DC-VSR, a novel VSR approach to produce spatially and temporally consistent VSR results with realistic textures. To achieve spatial and temporal consistency, DC-VSR adopts a novel Spatial Attention Propagation (SAP) scheme and a Temporal Attention Propagation (TAP) scheme that propagate information across spatio-temporal tiles based on the self-attention mechanism. To enhance high-frequency details, we also introduce Detail-Suppression Self-Attention Guidance (DSSAG), a novel diffusion guidance scheme. Comprehensive experiments demonstrate that DC-VSR achieves spatially and temporally consistent, high-quality VSR results, outperforming previous approaches.

View on arXiv PDF

Similar