ROCVOct 15, 2024

PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

arXiv:2410.11564v26 citationsh-index: 7IROS
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling robots to better interact with objects in the physical world, representing an incremental improvement over existing methods.

The paper tackles the problem of 3D affordance understanding for robotic interaction by introducing PAVLM, a framework that integrates vision-language models with point clouds, and it outperforms baseline methods on the 3D-AffordanceNet benchmark, showing strong generalization to novel tasks.

Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world. Although Visual Language Models (VLMs) have excelled in high-level reasoning and long-horizon planning for robotic manipulation, they still fall short in grasping the nuanced physical properties required for effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Affordance Vision-Language Model), an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud. PAVLM integrates a geometric-guided propagation module with hidden embeddings from large language models (LLMs) to enrich visual semantics. On the language side, we prompt Llama-3.1 models to generate refined context-aware text, augmenting the instructional input with deeper semantic cues. Experimental results on the 3D-AffordanceNet benchmark demonstrate that PAVLM outperforms baseline methods for both full and partial point clouds, particularly excelling in its generalization to novel open-world affordance tasks of 3D objects. For more information, visit our project site: pavlm-source.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes