CVMar 6

VS3R: Robust Full-frame Video Stabilization via Deep 3D Reconstruction

Muhua Zhu, Xinhao Jin, Yu Zhang, Yifei Xue, Tie Ji, Yizhen Lao

arXiv:2603.05851v112.9h-index: 24

Predicted impact top 28% in CV · last 90 daysOriginality Highly original

AI Analysis

This work provides a more robust and visually consistent video stabilization solution for users dealing with extreme camera motions, improving upon existing 2D and 3D techniques.

This paper addresses the trade-off in video stabilization between geometric robustness and full-frame consistency, where 2D methods crop aggressively and 3D methods fail under extreme motion. The proposed VS3R framework, combining feed-forward 3D reconstruction with generative video diffusion, achieves high-fidelity, full-frame stabilization across diverse camera models and significantly outperforms state-of-the-art methods in robustness and visual quality.

Video stabilization aims to mitigate camera shake but faces a fundamental trade-off between geometric robustness and full-frame consistency. While 2D methods suffer from aggressive cropping, 3D techniques are often undermined by fragile optimization pipelines that fail under extreme motions. To bridge this gap, we propose VS3R, a framework that synergizes feed-forward 3D reconstruction with generative video diffusion. Our pipeline jointly estimates camera parameters, depth, and masks to ensure all-scenario reliability, and introduces a Hybrid Stabilized Rendering module that fuses semantic and geometric cues for dynamic consistency. Finally, a Dual-Stream Video Diffusion Model restores disoccluded regions and rectifies artifacts by synergizing structural guidance with semantic anchors. Collectively, VS3R achieves high-fidelity, full-frame stabilization across diverse camera models and significantly outperforms state-of-the-art methods in robustness and visual quality.

View on arXiv PDF

Similar