AIROOct 24, 2025

Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning

arXiv:2510.21302v17 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses reliability issues in code generation for robots in dynamic environments, representing an incremental improvement over existing methods.

The paper tackles the problem of limited environmental grounding in LLM-based code-as-policies for embodied task planning, proposing a neuro-symbolic framework that improves task success rates by 46.2% over baselines and achieves over 86.8% executability of actions.

Recent advances in large language models (LLMs) have enabled the automatic generation of executable code for task planning and control in embodied agents such as robots, demonstrating the potential of LLM-based embodied intelligence. However, these LLM-based code-as-policies approaches often suffer from limited environmental grounding, particularly in dynamic or partially observable settings, leading to suboptimal task success rates due to incorrect or incomplete code generation. In this work, we propose a neuro-symbolic embodied task planning framework that incorporates explicit symbolic verification and interactive validation processes during code generation. In the validation phase, the framework generates exploratory code that actively interacts with the environment to acquire missing observations while preserving task-relevant states. This integrated process enhances the grounding of generated code, resulting in improved task reliability and success rates in complex environments. We evaluate our framework on RLBench and in real-world settings across dynamic, partially observable scenarios. Experimental results demonstrate that our framework improves task success rates by 46.2% over Code-as-Policies baselines and attains over 86.8% executability of task-relevant actions, thereby enhancing the reliability of task planning in dynamic environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes