CVJan 25, 2025

TranStable: Towards Robust Pixel-level Online Video Stabilization by Jointing Transformer and CNN

zhizhen li, tianyi zhuo, Yifei Cao, Jizhe Yu, Yu Liu

arXiv:2501.15138v13.62 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses video stabilization for applications requiring robust pixel-level transformations, though it appears incremental as it combines existing Transformer and CNN techniques.

The paper tackles video stabilization challenges like distortion and excessive cropping by proposing TranStable, an end-to-end framework that uses a TransformerUNet generator and a Stability Discriminator Module, achieving state-of-the-art performance on benchmarks such as NUS, DeepStab, and Selfie.

Video stabilization often struggles with distortion and excessive cropping. This paper proposes a novel end-to-end framework, named TranStable, to address these challenges, comprising a genera tor and a discriminator. We establish TransformerUNet (TUNet) as the generator to utilize the Hierarchical Adaptive Fusion Module (HAFM), integrating Transformer and CNN to leverage both global and local features across multiple visual cues. By modeling frame-wise relationships, it generates robust pixel-level warping maps for stable geometric transformations. Furthermore, we design the Stability Discriminator Module (SDM), which provides pixel-wise supervision for authenticity and consistency in training period, ensuring more complete field-of-view while minimizing jitter artifacts and enhancing visual fidelity. Extensive experiments on NUS, DeepStab, and Selfie benchmarks demonstrate state-of-the-art performance.

View on arXiv PDF

Similar