DeepV2D: Video to Depth with Differentiable Structure from Motion
This work addresses depth estimation from video for computer vision applications, presenting an incremental improvement by combining classical geometric algorithms with deep learning in a differentiable framework.
The paper tackles the problem of predicting depth from video by proposing DeepV2D, an end-to-end deep learning architecture that integrates neural networks with geometric principles, resulting in accurate depth estimation through alternating motion and depth stages.
We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available https://github.com/princeton-vl/DeepV2D.