Chengdi Ma

h-index4
2papers

2 Papers

AIAug 26, 2025Code
MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

Weikang Zhao, Xili Wang, Chengdi Ma et al.

With the recent rapid advancement of Agentic Intelligence, agentic tool use in LLMs has become increasingly important. During multi-turn interactions between agents and users, the dynamic, uncertain, and stochastic nature of user demands poses significant challenges to the agent's tool invocation capabilities. Agents are no longer expected to simply call tools to deliver a result; rather, they must iteratively refine their understanding of user needs through communication while simultaneously invoking tools to resolve user queries. Existing reinforcement learning (RL) approaches for tool use lack the integration of genuinely dynamic users during the RL training process. To bridge this gap, we introduce MUA-RL (Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use), a novel reinforcement learning framework that, for the first time in the field of agentic tool use, integrates LLM-simulated users into the reinforcement learning loop. MUA-RL aims to enable autonomous learning of models to communicate with users efficiently and use various tools to solve practical problems in dynamic multi-turn interactions. Evaluations are done on several multi-turn tool-using benchmarks (see Figure 1). Specifically, MUA-RL-32B achieves 67.3 on TAU2 Retail, 45.4 on TAU2 Airline, 28.3 on TAU2 Telecom, 28.4 on BFCL-V3 Multi Turn, and 82.5 on ACEBench Agent -- outperforming or matching the performance of larger open-source models such as DeepSeek-V3-0324 and Qwen3-235B-A22B in non-thinking settings.

12.2NAMay 5
A high-order rectilinear Lagrangian method based on the geometric conservation law

Xun Wang, Chengdi Ma

This paper presents a mesh moving strategy for high-order Lagrangian method on quadrilateral meshes. The primary evidence of this method stems from principle of area conservative linearization and the asymptotic properties of the velocity. The former strictly adheres to the requirements of geometric conservation laws, while the latter provides a high-order accuracy guarantee. Two smooth vortex test cases verify the feasibility of the proposed scheme.