CVAug 23, 2024

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao

arXiv:2408.13024v217.823 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses a generalization issue in 3D affordance grounding for robotics applications, representing an incremental improvement over existing methods.

The paper tackles the problem of poor generalization in 3D object affordance grounding due to inconsistent geometric structures between 3D objects and human-object interaction images, by proposing the MIFAG framework that learns invariant affordance knowledge from multiple images within the same category, and reports outperforming state-of-the-art methods on the new MIPA benchmark.

3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the \textbf{M}ulti-\textbf{I}mage Guided Invariant-\textbf{F}eature-Aware 3D \textbf{A}ffordance \textbf{G}rounding (\textbf{MIFAG}) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (\textbf{IAM}) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (\textbf{ADM}) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. Besides, the Multi-Image and Point Affordance (\textbf{MIPA}) benchmark is constructed and our method outperforms existing state-of-the-art methods on various experimental comparisons. Project page: \url{https://goxq.github.io/mifag}

View on arXiv PDF

Similar