CVSep 5, 2025

Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting

Sen Wang, Kunyi Li, Siyun Liang, Elena Alegret, Jing Ma, Nassir Navab, Stefano Gasperini

arXiv:2509.05515v114.45 citationsh-index: 101

Originality Incremental advance

AI Analysis

This addresses multi-view inconsistency and background noise issues in 3D scene understanding for applications like robotics and AR/VR, representing an incremental improvement over prior distillation methods.

The paper tackled the problem of inconsistent and noisy language feature aggregation in open-vocabulary 3D segmentation using 3D Gaussian Splatting, by introducing VALA with visibility-aware gating and streaming weighted geometric median, resulting in improved localization and segmentation that surpasses existing works.

Recently, distilling open-vocabulary language features from 2D images into 3D Gaussians has attracted significant attention. Although existing methods achieve impressive language-based interactions of 3D scenes, we observe two fundamental issues: background Gaussians contributing negligibly to a rendered pixel get the same feature as the dominant foreground ones, and multi-view inconsistencies due to view-specific noise in language embeddings. We introduce Visibility-Aware Language Aggregation (VALA), a lightweight yet effective method that computes marginal contributions for each ray and applies a visibility-aware gate to retain only visible Gaussians. Moreover, we propose a streaming weighted geometric median in cosine space to merge noisy multi-view features. Our method yields a robust, view-consistent language feature embedding in a fast and memory-efficient manner. VALA improves open-vocabulary localization and segmentation across reference datasets, consistently surpassing existing works.

View on arXiv PDF

Similar