CVJul 18, 2024

Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction

arXiv:2407.13368v11 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of enabling mobile robots to autonomously navigate and manipulate objects in unknown settings, though it appears incremental by combining existing methods.

The paper tackles the problem of affordance perception for robots in open-world environments by improving the ability to distinguish subtle object differences for actionable suggestions, resulting in an effective mix of affordance representation, vision-language models, and human-in-the-loop corrections demonstrated in door-opening scenarios.

Mobile robot platforms will increasingly be tasked with activities that involve grasping and manipulating objects in open world environments. Affordance understanding provides a robot with means to realise its goals and execute its tasks, e.g. to achieve autonomous navigation in unknown buildings where it has to find doors and ways to open these. In order to get actionable suggestions, robots need to be able to distinguish subtle differences between objects, as they may result in different action sequences: doorknobs require grasp and twist, while handlebars require grasp and push. In this paper, we improve affordance perception for a robot in an open-world setting. Our contribution is threefold: (1) We provide an affordance representation with precise, actionable affordances; (2) We connect this knowledge base to a foundational vision-language models (VLM) and prompt the VLM for a wider variety of new and unseen objects; (3) We apply a human-in-the-loop for corrections on the output of the VLM. The mix of affordance representation, image detection and a human-in-the-loop is effective for a robot to search for objects to achieve its goals. We have demonstrated this in a scenario of finding various doors and the many different ways to open them.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes