CVLGIVMar 5, 2025

Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment

arXiv:2503.03355v52 citationsh-index: 2ICSPS
Originality Incremental advance
AI Analysis

This work addresses video super-resolution for computer vision applications by proposing an alignment-free approach, which is incremental as it builds on existing diffusion methods.

The authors tackled video super-resolution by introducing a diffusion-based method that eliminates the need for motion alignment, using a diffusion transformer as a space-time model to handle motion patterns as prior knowledge. The method achieved feasibility on synthetic and real-world datasets without requiring re-training for different sampling conditions.

In this work, we rethink the approach to video super-resolution by introducing a method based on the Diffusion Posterior Sampling framework, combined with an unconditional video diffusion transformer operating in latent space. The video generation model, a diffusion transformer, functions as a space-time model. We argue that a powerful model, which learns the physics of the real world, can easily handle various kinds of motion patterns as prior knowledge, thus eliminating the need for explicit estimation of optical flows or motion parameters for pixel alignment. Furthermore, a single instance of the proposed video diffusion transformer model can adapt to different sampling conditions without re-training. Empirical results on synthetic and real-world datasets illustrate the feasibility of diffusion-based, alignment-free video super-resolution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes