CV AIMar 27

GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation

Xujing Tao, Chuxin Wang, Yubo Ai, Zhixin Cheng, Zhuoyuan Li, Liangsheng Liu, Yujia Chen, Xinjun Li, Qiao Li, Wenfei Yang, Tianzhu Zhang

arXiv:2603.2626076.51 citationsh-index: 12

Predicted impact top 32% in CV · last 90 daysOriginality Highly original

AI Analysis

This addresses the challenge of segmenting arbitrary categories in 3D scenes beyond training data, which is important for applications like robotics and autonomous systems, and is incremental as it builds on prior 2D distillation methods by enhancing 3D geometric consistency.

The paper tackles the problem of open-vocabulary 3D semantic segmentation, where existing methods rely on 2D models that limit 3D geometric learning and propagate errors, by proposing GeoGuide, a framework that integrates hierarchical geometry-semantic consistency using pretrained 3D models, resulting in superior performance on datasets like ScanNet v2, Matterport3D, and nuScenes.

Open-vocabulary 3D semantic segmentation aims to segment arbitrary categories beyond the training set. Existing methods predominantly rely on distilling knowledge from 2D open-vocabulary models. However, aligning 3D features to the 2D representation space restricts intrinsic 3D geometric learning and inherits errors from 2D predictions. To address these limitations, we propose GeoGuide, a novel framework that leverages pretrained 3D models to integrate hierarchical geometry-semantic consistency for open-vocabulary 3D segmentation. Specifically, we introduce an Uncertainty-based Superpoint Distillation module to fuse geometric and semantic features for estimating per-point uncertainty, adaptively weighting 2D features within superpoints to suppress noise while preserving discriminative information to enhance local semantic consistency. Furthermore, our Instance-level Mask Reconstruction module leverages geometric priors to enforce semantic consistency within instances by reconstructing complete instance masks. Additionally, our Inter-Instance Relation Consistency module aligns geometric and semantic similarity matrices to calibrate cross-instance consistency for same-category objects, mitigating viewpoint-induced semantic drift. Extensive experiments on ScanNet v2, Matterport3D, and nuScenes demonstrate the superior performance of GeoGuide.

View on arXiv PDF

Similar