AISep 21, 2025

MCTS-EP: Empowering Embodied Planning with Online Preference Optimization

Hang Xu, Zang Yu, Yehui Tang, Pengbo Hu, Yuhao Tang, Hao Dong

arXiv:2509.17116v15.81 citationsh-index: 1Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient embodied planning for AI agents, representing an incremental improvement through integration of existing methods.

The paper tackles the problem of training embodied agents by introducing MCTS-EP, an online learning framework that combines LLMs with MCTS, achieving state-of-the-art performance with success rates up to 92% in ALFWorld and reducing interaction steps from 18.7/19.5 to 10.2/9.9.

This paper introduces MCTS-EP, an online learning framework that combines large language models (LLM) with Monte Carlo Tree Search (MCTS) for training embodied agents. MCTS-EP integrates three key components: MCTS-guided exploration for preference data collection, efficient multi-modal reasoning mechanism, and iterative training pipeline based on preference optimization. We theoretically prove that MCTS-EP achieves better performance bounds than conventional on-policy algorithms when the loss function is strongly convex, and demonstrate that it can be formulated as a search-enhanced variant of GAIL. MCTS-EP achieves state-of-the-art performace across serval benchmarks. In ALFWorld, it achieves 92% and 87% success rates for textual and visual tasks. In WebShop, it reaches an average reward of 0.81. MTCS-EP also reduces average interaction steps from from 18.7/19.5 to 10.2/9.9 steps in visual ALFWorld.Code available at: https://github.com/xuhang-2/Embodied-Agent-Planning

View on arXiv PDF Code

Similar