CLJan 28

PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use

Qihao Wang, Mingzhe Lu, Jiayue Wu, Yue Hu, Yanbing Liu

arXiv:2601.20439v10.6h-index: 1

Originality Highly original

AI Analysis

This addresses the challenge of robust tool use for LLM-based agents, representing a strong incremental advance in a specific domain.

The paper tackles the problem of weak planning and execution in large language models for multi-turn tool invocation by introducing PEARL, a framework that combines offline tool exploration and online reinforcement learning, achieving a state-of-the-art success rate of 56.5% on the ToolHop benchmark.

Large Language Models show great potential with external tools, but face significant challenges in complex, multi-turn tool invocation. They often exhibit weak planning, tool hallucination, erroneous parameter generation, and struggle with robust interaction. To tackle these issues, we present PEARL, a novel framework to enhance LLM planning and execution for sophisticated tool use. PEARL adopts a two-stage approach: an offline phase where the agent explores tools to learn valid usage patterns and failure conditions, and an online reinforcement learning phase. In the online phase, a dedicated Planner is trained via group Relative Policy Optimization (GRPO) with a carefully designed reward function that provides distinct signals for planning quality. Experiments on the ToolHop and T-Eval benchmarks show PEARL significantly outperforms existing methods, achieving a new state-of-the-art success rate of \textbf{56.5\%} on ToolHop while maintaining a low invocation error rate. Our work marks a key advance in addressing the complex planning challenges of tool use, contributing to the development of more robust and reliable LLM-based agents.

View on arXiv PDF

Similar