CVROMay 8, 2025

Visual Affordance Prediction: Survey and Reproducibility

arXiv:2505.05074v22 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This work addresses methodological inconsistencies and reproducibility challenges for researchers in computer vision and robotics, though it is incremental as it builds on existing survey and documentation practices.

The paper tackles the problem of inconsistent definitions and reproducibility issues in visual affordance prediction by proposing a unified formulation and introducing the Affordance Sheet to improve transparency and fairness in benchmarks.

Affordances are the potential actions an agent can perform on an object, as observed by a camera. Visual affordance prediction is formulated differently for tasks such as grasping detection, affordance classification, affordance segmentation, and hand pose estimation. This diversity in formulations leads to inconsistent definitions that prevent fair comparisons between methods. In this paper, we propose a unified formulation of visual affordance prediction by accounting for the complete information on the objects of interest and the interaction of the agent with the objects to accomplish a task. This unified formulation allows us to comprehensively and systematically review disparate visual affordance works, highlighting strengths and limitations of both methods and datasets. We also discuss reproducibility issues, such as the unavailability of methods implementation and experimental setups details, making benchmarks for visual affordance prediction unfair and unreliable. To favour transparency, we introduce the Affordance Sheet, a document that details the solution, datasets, and validation of a method, supporting future reproducibility and fairness in the community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes