CVAug 28, 2022

Grounded Affordance from Exocentric View

Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

arXiv:2208.13196v217.632 citationsh-index: 43Has Code

Originality Incremental advance

AI Analysis

This work addresses a domain-specific challenge in embodied intelligence by enabling agents to ground affordances from diverse human-object interactions, though it is incremental as it builds on existing affordance grounding methods.

The paper tackles the problem of affordance grounding from exocentric views by proposing a cross-view knowledge transfer framework to learn and transfer affordance knowledge to egocentric images, achieving superior performance over representative models in objective metrics and visual quality.

Affordance grounding aims to locate objects' "action possibilities" regions, which is an essential step toward embodied intelligence. Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels. Human has the ability that transforms the various exocentric interactions into invariant egocentric affordance to counter the impact of interactive diversity. To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i.e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision. However, there is some "interaction bias" between personas, mainly regarding different regions and different views. To this end, we devise a cross-view affordance knowledge transfer framework that extracts affordance-specific features from exocentric interactions and transfers them to the egocentric view. Specifically, the perception of affordance regions is enhanced by preserving affordance co-relations. In addition, an affordance grounding dataset named AGD20K is constructed by collecting and labeling over 20K images from $36$ affordance categories. Experimental results demonstrate that our method outperforms the representative models regarding objective metrics and visual quality. Code is released at https://github.com/lhc1224/Cross-view-affordance-grounding.

View on arXiv PDF Code

Similar