CVLGROMar 12, 2025

2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos

arXiv:2503.09320v39 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the need for more nuanced affordance prediction in robotics and vision, moving beyond naive segmentation to account for task-specific and bimanual interactions, though it is incremental in improving existing methods.

The paper tackles the problem of predicting precise, actionable affordance regions from human videos, particularly for bimanual actions, by introducing a dataset and VLM-based model that outperforms baselines in segmentation tasks and demonstrates utility in robotic manipulation.

When interacting with objects, humans effectively reason about which regions of objects are viable for an intended action, i.e., the affordance regions of the object. They can also account for subtle differences in object regions based on the task to be performed and whether one or two hands need to be used. However, current vision-based affordance prediction methods often reduce the problem to naive object part segmentation. In this work, we propose a framework for extracting affordance data from human activity video datasets. Our extracted 2HANDS dataset contains precise object affordance region segmentations and affordance class-labels as narrations of the activity performed. The data also accounts for bimanual actions, i.e., two hands co-ordinating and interacting with one or more objects. We present a VLM-based affordance prediction model, 2HandedAfforder, trained on the dataset and demonstrate superior performance over baselines in affordance region segmentation for various activities. Finally, we show that our predicted affordance regions are actionable, i.e., can be used by an agent performing a task, through demonstration in robotic manipulation scenarios. Project-website: https://sites.google.com/view/2handedafforder

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes