CVMar 12, 2024

Q-SLAM: Quadric Representations for Monocular SLAM

Berkeley
arXiv:2403.08125v214 citationsh-index: 41CoRL
Originality Incremental advance
AI Analysis

This work addresses efficiency and accuracy challenges in 3D scene reconstruction for robotics and AR/VR applications, representing an incremental improvement over existing volumetric SLAM methods.

The paper tackles the problem of inefficient volumetric representations in monocular SLAM by decomposing rigid scene components into quadric surfaces, which improves depth estimation accuracy and reduces computational requirements compared to previous NeRF-SLAM systems.

In this paper, we reimagine volumetric representations through the lens of quadrics. We posit that rigid scene components can be effectively decomposed into quadric surfaces. Leveraging this assumption, we reshape the volumetric representations with million of cubes by several quadric planes, which results in more accurate and efficient modeling of 3D scenes in SLAM contexts. First, we use the quadric assumption to rectify noisy depth estimations from RGB inputs. This step significantly improves depth estimation accuracy, and allows us to efficiently sample ray points around quadric planes instead of the entire volume space in previous NeRF-SLAM systems. Second, we introduce a novel quadric-decomposed transformer to aggregate information across quadrics. The quadric semantics are not only explicitly used for depth correction and scene decomposition, but also serve as an implicit supervision signal for the mapping network. Through rigorous experimental evaluation, our method exhibits superior performance over other approaches relying on estimated depth, and achieves comparable accuracy to methods utilizing ground truth depth on both synthetic and real-world datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes