LG AI CLAug 7, 2025

Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling

Bingguang Hao, Zengzhuang Xu, Maolin Wang, Yuntao Wen, Yicheng Chen, Cunyin Peng, Long Chen, Dong Wang, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang, Ji Zhang

arXiv:2508.05118v417.95 citationsh-index: 9

Originality Highly original

AI Analysis

This addresses the problem of inefficient function calling in LLMs for AI applications, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the challenge of training Large Language Models for robust function calling by balancing exploration and policy optimization, achieving a new state-of-the-art on the Berkeley Function Calling Leaderboard with a 4B-parameter model.

The effective training of Large Language Models (LLMs) for function calling faces a critical challenge: balancing exploration of complex reasoning paths with stable policy optimization. Standard methods like Supervised Fine-Tuning (SFT) fail to instill robust reasoning, and traditional Reinforcement Learning (RL) struggles with inefficient exploration. We propose \textbf{EGPO}, a new RL framework built upon Group Relative Policy Optimization (GRPO), designed to address this challenge directly. The core of EGPO is an entropy-enhanced advantage function that integrates the entropy of the model's Chain-of-Thought (CoT) into the policy gradient computation. This encourages the generation of diverse reasoning strategies. To maintain optimization direction, the entropy bonus is carefully constrained by a clipping mechanism. Complemented by a strict, binary reward signal, EGPO effectively guides the model towards discovering structured and accurate tool invocation patterns. On the challenging Berkeley Function Calling Leaderboard (BFCL), a 4B-parameter model trained with EGPO sets a new state-of-the-art among models of comparable size, surpassing a range of strong competitors, including GPT-4o and Gemini-2.5.

View on arXiv PDF

Similar