ROCVSep 12, 2019

Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks

arXiv:1909.05770v24 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more flexible robot manipulation by enabling affordance-based reasoning without object priors, though it is incremental as it builds on existing region proposal and attention mechanisms.

The paper tackles the problem of recognizing object affordances without relying on object category priors, which limits generalization, by proposing AffContext, a category-agnostic pipeline that reduces the performance gap between object-agnostic and object-informed methods and integrates with symbolic planning for robot manipulation tasks.

Affordance information about a scene provides important clues as to what actions may be executed in pursuit of meeting a specified goal state. Thus, integrating affordance-based reasoning into symbolic action plannning pipelines would enhance the flexibility of robot manipulation. Unfortunately, the top performing affordance recognition methods use object category priors to boost the accuracy of affordance detection and segmentation. Object priors limit generalization to unknown object categories. This paper describes an affordance recognition pipeline based on a category-agnostic region proposal network for proposing instance regions of an image across categories. To guide affordance learning in the absence of category priors, the training process includes the auxiliary task of explicitly inferencing existing affordances within a proposal. Secondly, a self-attention mechanism trained to interpret each proposal learns to capture rich contextual dependencies through the region. Visual benchmarking shows that the trained network, called AffContext, reduces the performance gap between object-agnostic and object-informed affordance recognition. AffContext is linked to the Planning Domain Definition Language (PDDL) with an augmented state keeper for action planning across temporally spaced goal-oriented tasks. Manipulation experiments show that AffContext can successfully parse scene content to seed a symbolic planner problem specification, whose execution completes the target task. Additionally, task-oriented grasping for cutting and pounding actions demonstrate the exploitation of multiple affordances for a given object to complete specified tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes