CLAIJun 29, 2025

Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning

arXiv:2506.23127v19 citationsh-index: 11
Originality Highly original
AI Analysis

This addresses the problem of LLMs struggling with interactive planning in partially observable environments for robotics or AI agents, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the challenge of enabling LLMs to perform embodied task planning by developing Embodied Planner-R1, a reinforcement learning framework that achieves completion rates of 97.78% on ALFWorld and 79.92% on ScienceWorld, surpassing prior methods.

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they face significant challenges in embodied task planning scenarios that require continuous environmental understanding and action generation. Existing approaches generate open-loop action scripts based on static knowledge, making it difficult to learn causal relationships between actions and environmental feedback, particularly in partially observable environments. We introduce Embodied Planner-R1, a novel outcome-driven reinforcement learning framework that enables LLMs to develop interactive capabilities through autonomous exploration with minimal supervision. Our framework incorporates three key innovations: (1) Without human annotations, we employ pure reinforcement learning with group rollout, incorporating in-environment interaction through parallel exploration; (2) completion-driven sparse reward; and (3) Interactive Policy Optimization (IPO) for efficient learning from grouped trajectories. Across two challenging text-based Embodied planning benchmarks, Embodied Planner-R1 achieves impressive completion rates of 97.78% on ALFWorld and 79.92% on ScienceWorld, surpassing prior methods by a large margin, and suffers only a -3.66% drop in previously unseen environments, evidencing strong generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes