CVAIIVNov 12, 2025

Task-Aware 3D Affordance Segmentation via 2D Guidance and Geometric Refinement

arXiv:2511.11702v14 citationsh-index: 23
Originality Highly original
AI Analysis

This work addresses the problem of enabling embodied agents to interact in complex environments, representing an incremental advance over existing methods that neglect geometric structure.

The paper tackles the challenge of 3D scene-level affordance segmentation from natural language instructions by introducing TASA, a framework that combines 2D semantic guidance with 3D geometric refinement, achieving significant improvements in accuracy and efficiency on the SceneFun3D benchmark.

Understanding 3D scene-level affordances from natural language instructions is essential for enabling embodied agents to interact meaningfully in complex environments. However, this task remains challenging due to the need for semantic reasoning and spatial grounding. Existing methods mainly focus on object-level affordances or merely lift 2D predictions to 3D, neglecting rich geometric structure information in point clouds and incurring high computational costs. To address these limitations, we introduce Task-Aware 3D Scene-level Affordance segmentation (TASA), a novel geometry-optimized framework that jointly leverages 2D semantic cues and 3D geometric reasoning in a coarse-to-fine manner. To improve the affordance detection efficiency, TASA features a task-aware 2D affordance detection module to identify manipulable points from language and visual inputs, guiding the selection of task-relevant views. To fully exploit 3D geometric information, a 3D affordance refinement module is proposed to integrate 2D semantic priors with local 3D geometry, resulting in accurate and spatially coherent 3D affordance masks. Experiments on SceneFun3D demonstrate that TASA significantly outperforms the baselines in both accuracy and efficiency in scene-level affordance segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes