CVROJun 4

T-FunS3D: Task-Driven Hierarchical Open-Vocabulary 3D Functionality Segmentation

arXiv:2606.0597530.3
AI Analysis

For robotic applications requiring efficient and actionable perception, T-FunS3D offers a practical balance between segmentation granularity, accuracy, and speed.

T-FunS3D introduces a task-driven hierarchical method for open-vocabulary 3D functionality segmentation that identifies relevant instances and their functional components based on a task description. It achieves comparable performance to state-of-the-art on SceneFun3D while being faster and more memory-efficient.

Open-vocabulary 3D functionality segmentation enables robots to localize functional object components in 3D scenes. It is a challenging task that requires spatial understanding and task interpretation. Current open-vocabulary 3D segmentation methods primarily focus on object-level recognition, while scene-wide part segmentation methods attempt to segment the entire scene exhaustively, making them highly resource-intensive and time consuming. Balancing segmentation performance in terms of granularity, accuracy, and speed remains a challenge. As one step towards alleviating this, we introduce T-FunS3D, a task-driven hierarchical open-vocabulary 3D functionality segmentation method that provides actionable perception for robotic applications. Our method takes as input the 3D point cloud and posed RGB-D images of an indoor scene. We construct an open-vocabulary scene graph by extracting instances and their visual embeddings in the environment. Given a task description, T-FunS3D identifies the most relevant instances in the scene graph and locates their functional components leveraging a vision-language model. Experiments on the SceneFun3D dataset demonstrate that T-FunS3D is comparable to state-of-the-art in open-vocabulary 3D functionality segmentation, while achieving faster runtime and reduced memory usage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes