AIApr 17, 2024

Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

Tsinghua
arXiv:2404.11027v126 citationsh-index: 13
Originality Highly original
AI Analysis

This work addresses the challenge of enabling LLMs to interact effectively with the physical world for robotic manipulation, offering a novel approach that avoids reliance on pre-defined skills or sub-policies.

The paper tackles the problem of grounding large language models (LLMs) in the physical world for robotic manipulation tasks by proposing a training-free framework called LLM+A, which uses affordance prompting to improve plan feasibility and control sequences, resulting in substantial performance improvements and easy generalization across environments.

While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly. We find that the main reason is that LLMs are not grounded in the physical world. Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policies, making it hard to adapt to new tasks. In contrast, we aim to address this problem and explore the possibility to prompt pre-trained LLMs to accomplish a series of robotic manipulation tasks in a training-free paradigm. Accordingly, we propose a framework called LLM+A(ffordance) where the LLM serves as both the sub-task planner (that generates high-level plans) and the motion controller (that generates low-level control sequences). To ground these plans and control sequences on the physical world, we develop the affordance prompting technique that stimulates the LLM to 1) predict the consequences of generated plans and 2) generate affordance values for relevant objects. Empirically, we evaluate the effectiveness of LLM+A in various language-conditioned robotic manipulation tasks, which show that our approach substantially improves performance by enhancing the feasibility of generated plans and control and can easily generalize to different environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes