CVDec 12, 2024

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Yuzheng Liu, Siyan Dong, Shuzhe Wang, Yingda Yin, Yanchao Yang, Qingnan Fan, Baoquan Chen

arXiv:2412.09401v331.597 citationsh-index: 4Has CodeCVPR

Originality Highly original

AI Analysis

This provides an end-to-end solution for real-time 3D reconstruction, addressing a key challenge in robotics and AR/VR, though it builds on existing neural methods.

The paper tackles the problem of real-time dense 3D scene reconstruction from monocular RGB videos by introducing SLAM3R, which achieves state-of-the-art accuracy and completeness while running at over 20 FPS.

In this paper, we introduce SLAM3R, a novel and effective system for real-time, high-quality, dense 3D reconstruction using RGB videos. SLAM3R provides an end-to-end solution by seamlessly integrating local 3D reconstruction and global coordinate registration through feed-forward neural networks. Given an input video, the system first converts it into overlapping clips using a sliding window mechanism. Unlike traditional pose optimization-based methods, SLAM3R directly regresses 3D pointmaps from RGB images in each window and progressively aligns and deforms these local pointmaps to create a globally consistent scene reconstruction - all without explicitly solving any camera parameters. Experiments across datasets consistently show that SLAM3R achieves state-of-the-art reconstruction accuracy and completeness while maintaining real-time performance at 20+ FPS. Code available at: https://github.com/PKU-VCL-3DV/SLAM3R.

View on arXiv PDF Code

Similar