CVOct 14, 2025

FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

arXiv:2510.12747v122 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the problem of high latency and poor scalability in diffusion-based video super-resolution for real-world applications, representing a significant incremental improvement in efficiency.

The paper tackled the challenge of making diffusion-based video super-resolution practical by proposing FlashVSR, a framework that achieves real-time performance at approximately 17 FPS for 768x1408 videos on a single A100 GPU, with up to 12x speedup over prior models.

Diffusion models have recently advanced video restoration, but applying them to real-world video super-resolution (VSR) remains challenging due to high latency, prohibitive computation, and poor generalization to ultra-high resolutions. Our goal in this work is to make diffusion-based VSR practical by achieving efficiency, scalability, and real-time performance. To this end, we propose FlashVSR, the first diffusion-based one-step streaming framework towards real-time VSR. FlashVSR runs at approximately 17 FPS for 768x1408 videos on a single A100 GPU by combining three complementary innovations: (i) a train-friendly three-stage distillation pipeline that enables streaming super-resolution, (ii) locality-constrained sparse attention that cuts redundant computation while bridging the train-test resolution gap, and (iii) a tiny conditional decoder that accelerates reconstruction without sacrificing quality. To support large-scale training, we also construct VSR-120K, a new dataset with 120k videos and 180k images. Extensive experiments show that FlashVSR scales reliably to ultra-high resolutions and achieves state-of-the-art performance with up to 12x speedup over prior one-step diffusion VSR models. We will release the code, pretrained models, and dataset to foster future research in efficient diffusion-based VSR.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes