CVROJan 12, 2024

AffordanceLLM: Grounding Affordance from Vision Language Models

arXiv:2401.06341v257 citationsh-index: 122024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
AI Analysis

This work addresses the challenge of affordance grounding for in-the-wild object interaction, offering improved generalization beyond limited training data, though it is incremental in applying existing vision-language models to this specific task.

The paper tackles the problem of affordance grounding, which involves identifying interactive areas on objects, by leveraging knowledge from pretrained vision-language models to improve generalization. The proposed model achieves significant performance gains on the AGD20K benchmark and can ground affordance for unseen objects and actions in random Internet images.

Affordance grounding refers to the task of finding the area of an object with which one can interact. It is a fundamental but challenging task, as a successful solution requires the comprehensive understanding of a scene in multiple aspects including detection, localization, and recognition of objects with their parts, of geo-spatial configuration/layout of the scene, of 3D shapes and physics, as well as of the functionality and potential interaction of the objects and humans. Much of the knowledge is hidden and beyond the image content with the supervised labels from a limited training set. In this paper, we make an attempt to improve the generalization capability of the current affordance grounding by taking the advantage of the rich world, abstract, and human-object-interaction knowledge from pretrained large-scale vision language models. Under the AGD20K benchmark, our proposed model demonstrates a significant performance gain over the competing methods for in-the-wild object affordance grounding. We further demonstrate it can ground affordance for objects from random Internet images, even if both objects and actions are unseen during training. Project site: https://jasonqsy.github.io/AffordanceLLM/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes