AISep 11, 2025

How well can LLMs provide planning feedback in grounded environments?

arXiv:2509.09790v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of reducing the need for reward design or demonstrations in planning for AI researchers, but it is incremental as it builds on prior work using foundation models for planning.

The paper evaluated how well large language models (LLMs) and vision language models (VLMs) provide planning feedback in grounded environments, finding that they can deliver diverse high-quality feedback across domains, with larger and reasoning models offering more accurate and less biased feedback, though quality degrades in complex or continuous environments.

Learning to plan in grounded environments typically requires carefully designed reward functions or high-quality annotated demonstrations. Recent works show that pretrained foundation models, such as large language models (LLMs) and vision language models (VLMs), capture background knowledge helpful for planning, which reduces the amount of reward design and demonstrations needed for policy learning. We evaluate how well LLMs and VLMs provide feedback across symbolic, language, and continuous control environments. We consider prominent types of feedback for planning including binary feedback, preference feedback, action advising, goal advising, and delta action feedback. We also consider inference methods that impact feedback performance, including in-context learning, chain-of-thought, and access to environment dynamics. We find that foundation models can provide diverse high-quality feedback across domains. Moreover, larger and reasoning models consistently provide more accurate feedback, exhibit less bias, and benefit more from enhanced inference methods. Finally, feedback quality degrades for environments with complex dynamics or continuous state spaces and action spaces.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes