CVDec 11, 2024

SLGaussian: Fast Language Gaussian Splatting in Sparse Views

Kangjie Chen, BingQuan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang

arXiv:2412.08331v312.810 citationsh-index: 7MM

Originality Incremental advance

AI Analysis

This addresses the need for fast, accurate 3D scene understanding in applications like autonomous navigation and AR/VR, though it appears incremental as it builds on existing 3DGS and language embedding techniques.

The paper tackles the problem of 3D semantic field learning from sparse viewpoints, which is challenging for existing methods due to inefficient per-scene optimizations. The proposed SLGaussian method achieves scene inference in under 30 seconds and open-vocabulary querying in 0.011 seconds per query, outperforming existing methods on metrics like IoU and mIoU.

3D semantic field learning is crucial for applications like autonomous navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from limited viewpoints is essential. Existing methods struggle under sparse view conditions, relying on inefficient per-scene multi-view optimizations, which are impractical for many real-world tasks. To address this, we propose SLGaussian, a feed-forward method for constructing 3D semantic fields from sparse viewpoints, allowing direct inference of 3DGS-based scenes. By ensuring consistent SAM segmentations through video tracking and using low-dimensional indexing for high-dimensional CLIP features, SLGaussian efficiently embeds language information in 3D space, offering a robust solution for accurate 3D scene understanding under sparse view conditions. In experiments on two-view sparse 3D object querying and segmentation in the LERF and 3D-OVS datasets, SLGaussian outperforms existing methods in chosen IoU, Localization Accuracy, and mIoU. Moreover, our model achieves scene inference in under 30 seconds and open-vocabulary querying in just 0.011 seconds per query.

View on arXiv PDF

Similar