CLAILGMay 3, 2023

Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

arXiv:2305.02412v243 citations
AI Analysis

This addresses the problem of inefficient LLM integration for embodied agents, offering an incremental improvement in instruction-following performance.

The paper tackles the challenge of using large language models (LLMs) for embodied agents by proposing the PET framework to simplify control tasks, resulting in a 15% improvement over state-of-the-art on the AlfWorld benchmark for generalization to human goal specifications.

Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes