CVDec 11, 2024

SLGaussian: Fast Language Gaussian Splatting in Sparse Views

arXiv:2412.08331v310 citationsh-index: 7MM
AI Analysis

This addresses the need for fast, accurate 3D scene understanding in applications like autonomous navigation and AR/VR, though it appears incremental as it builds on existing 3DGS and language embedding techniques.

The paper tackles the problem of 3D semantic field learning from sparse viewpoints, which is challenging for existing methods due to inefficient per-scene optimizations. The proposed SLGaussian method achieves scene inference in under 30 seconds and open-vocabulary querying in 0.011 seconds per query, outperforming existing methods on metrics like IoU and mIoU.

3D semantic field learning is crucial for applications like autonomous navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from limited viewpoints is essential. Existing methods struggle under sparse view conditions, relying on inefficient per-scene multi-view optimizations, which are impractical for many real-world tasks. To address this, we propose SLGaussian, a feed-forward method for constructing 3D semantic fields from sparse viewpoints, allowing direct inference of 3DGS-based scenes. By ensuring consistent SAM segmentations through video tracking and using low-dimensional indexing for high-dimensional CLIP features, SLGaussian efficiently embeds language information in 3D space, offering a robust solution for accurate 3D scene understanding under sparse view conditions. In experiments on two-view sparse 3D object querying and segmentation in the LERF and 3D-OVS datasets, SLGaussian outperforms existing methods in chosen IoU, Localization Accuracy, and mIoU. Moreover, our model achieves scene inference in under 30 seconds and open-vocabulary querying in just 0.011 seconds per query.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes