FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent
This provides a fully differentiable, gradient-descent based alternative to traditional structure-from-motion methods, enabling photo-realistic novel view synthesis for applications in computer vision and graphics.
The paper tackles the problem of jointly estimating precise camera poses, intrinsics, and dense depth from video sequences, achieving performance on par with the state-of-the-art SfM method COLMAP for 360-degree novel view synthesis.
This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce differentiable re-parameterizations of depth, intrinsics, and pose that are amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360-degree trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360-degree novel view synthesis (even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM).