CL AI LGMay 26, 2023

AdaPlanner: Adaptive Planning from Feedback with Language Models

Haotian Sun, Yuchen Zhuang, Lingkai Kong, Bo Dai, Chao Zhang

arXiv:2305.16653v124.3228 citationsHas Code

Originality Highly original

AI Analysis

This addresses the challenge of improving autonomous agent performance in complex environments for AI and robotics applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of LLM agents performing poorly in complex sequential decision-making tasks due to lack of adaptive planning, and proposes AdaPlanner, which refines plans based on environmental feedback, resulting in performance improvements of 3.73% and 4.11% over state-of-the-art baselines in ALFWorld and MiniWoB++ environments while using significantly fewer samples.

Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily without planning or rely on static plans that are not adaptable to environmental feedback. Consequently, the sequential decision-making performance of LLM agents degenerates with problem complexity and plan horizons increase. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. In AdaPlanner, the LLM agent adaptively refines its plan from feedback with both in-plan and out-of-plan refinement strategies. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities. Furthermore, we propose a skill discovery mechanism that leverages successful plans as few-shot exemplars, enabling the agent to plan and refine with fewer task demonstrations. Our experiments in the ALFWorld and MiniWoB++ environments demonstrate that AdaPlanner outperforms state-of-the-art baselines by 3.73% and 4.11% while utilizing 2x and 600x fewer samples, respectively.

View on arXiv PDF Code

Similar