RO AIMar 27, 2025

Embodied Long Horizon Manipulation with Closed-loop Code Generation and Incremental Few-shot Adaptation

Yuan Meng, Xiangtong Yao, Haihui Ye, Yirui Zhou, Shengqiang Zhang, Zhenguo Sun, Xukun Li, Zhenshan Bing, Alois Knoll

arXiv:2503.21969v35.72 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of generalizing robotic manipulation to unseen scenarios without large task-specific datasets, though it builds incrementally on existing LLM-based planning approaches.

The paper tackles the problem of embodied long-horizon manipulation by discarding pretrained low-level policies and instead using an LLM to directly generate executable code plans in a closed-loop framework with incremental few-shot learning and feedback, achieving state-of-the-art performance on 30+ diverse seen and unseen tasks.

Embodied long-horizon manipulation requires robotic systems to process multimodal inputs-such as vision and natural language-and translate them into executable actions. However, existing learning-based approaches often depend on large, task-specific datasets and struggle to generalize to unseen scenarios. Recent methods have explored using large language models (LLMs) as high-level planners that decompose tasks into subtasks using natural language and guide pretrained low-level controllers. Yet, these approaches assume perfect execution from low-level policies, which is unrealistic in real-world environments with noise or suboptimal behaviors. To overcome this, we fully discard the pretrained low-level policy and instead use the LLM to directly generate executable code plans within a closed-loop framework. Our planner employs chain-of-thought (CoT)-guided few-shot learning with incrementally structured examples to produce robust and generalizable task plans. Complementing this, a reporter evaluates outcomes using RGB-D and delivers structured feedback, enabling recovery from misalignment and replanning under partial observability. This design eliminates per-step inference, reduces computational overhead, and limits error accumulation that was observed in previous methods. Our framework achieves state-of-the-art performance on 30+ diverse seen and unseen long-horizon tasks across LoHoRavens, CALVIN, Franka Kitchen, and cluttered real-world settings.

View on arXiv PDF Code

Similar