CVMar 27, 2022

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

arXiv:2203.14186v1105 citationsh-index: 14Has Code
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck for real-time video enhancement applications, offering a more efficient solution for tasks requiring high-frame-rate and high-resolution video generation.

The paper tackles the problem of slow inference speed in space-time video super-resolution by proposing a real-time spatial-temporal transformer that integrates spatial and temporal super-resolution into a single model, achieving a 60% reduction in parameters and 80% faster speed compared to the state-of-the-art without significant performance loss.

Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. Unlike CNN-based methods, we do not explicitly use separated building blocks for temporal interpolations and spatial super-resolutions; instead, we only use a single end-to-end transformer architecture. Specifically, a reusable dictionary is built by encoders based on the input LFR and LR frames, which is then utilized in the decoder part to synthesize the HFR and HR frames. Compared with the state-of-the-art TMNet \cite{xu2021temporal}, our network is $60\%$ smaller (4.5M vs 12.3M parameters) and $80\%$ faster (26.2fps vs 14.3fps on $720\times576$ frames) without sacrificing much performance. The source code is available at https://github.com/llmpass/RSTT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes