CVJul 17, 2023

NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation

Yiran Wang, Min Shi, Jiaqi Li, Chaoyi Hong, Zihao Huang, Juewen Peng, Zhiguo Cao, Jianming Zhang, Ke Xian, Guosheng Lin

arXiv:2307.08695v312.68 citationsh-index: 51Has Code

Originality Incremental advance

AI Analysis

This addresses the inefficiency and lack of robustness in video depth estimation for computer vision applications, though it appears incremental as it builds on existing single-image models with stabilization techniques.

The paper tackles the problem of achieving temporally consistent depth estimation in videos by introducing NVDS+, a plug-and-play stabilizer for single-image depth models, and a large-scale dataset (VDW) with 14,203 videos. It shows significant improvements in consistency, accuracy, and efficiency across multiple benchmarks and extends to applications like semantic segmentation and 3D reconstruction.

Video depth estimation aims to infer temporally consistent depth. One approach is to finetune a single-image model on each video with geometry constraints, which proves inefficient and lacks robustness. An alternative is learning to enforce consistency from data, which requires well-designed models and sufficient video depth data. To address both challenges, we introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner. We also elaborate a large-scale Video Depth in the Wild (VDW) dataset, which contains 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset. Additionally, a bidirectional inference strategy is designed to improve consistency by adaptively fusing forward and backward predictions. We instantiate a model family ranging from small to large scales for different applications. The method is evaluated on VDW dataset and three public benchmarks. To further prove the versatility, we extend NVDS+ to video semantic segmentation and several downstream applications like bokeh rendering, novel view synthesis, and 3D reconstruction. Experimental results show that our method achieves significant improvements in consistency, accuracy, and efficiency. Our work serves as a solid baseline and data foundation for learning-based video depth estimation. Code and dataset are available at: https://github.com/RaymondWang987/NVDS

View on arXiv PDF Code

Similar