LGCLOct 13, 2025

Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

arXiv:2510.11184v15 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the challenge of domain generalization for LLM-based tool-use systems, which is incremental but important for practical applications.

The paper tackles the problem of cross-domain generalization for tool-augmented reinforcement learning (RL) agents trained only on mathematical tasks, showing that such agents can effectively transfer to other reasoning domains with high performance and token efficiency.

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in reasoning and tool utilization. However, the generalization of tool-augmented reinforcement learning (RL) across diverse domains remains underexplored. In this work, we investigate the cross-domain generalization of an LLM agent equipped with a code interpreter tool, which is exclusively trained on mathematical problem-solving tasks. Despite the restricted training domain, we evaluate the agent's performance across several distinct reasoning domains. The results reveal that RL-based tool usage learned from mathematical tasks can be effectively transferred to complex tasks in other domains, enabling great task performance and high token efficiency. To facilitate this cross-domain transfer, we propose a Tool Generalization Reinforcement Learning (TGRL) framework designed to promote domain-agnostic learning and skill migration, encompassing: (i) a standardized tool interface that abstracts domain-specific nuances through consistent formatting and explicit termination, fostering transferable invocation patterns; (ii) a dual-component reward system that decomposes rewards to incentivize generalizable behaviors like tool efficiency and reasoning abstraction, ensuring alignment and robustness across domain shifts; and (iii) an XML-based prompt template that separates thinking, tool calls, and responses to encourage modular, domain-invariant planning and coherent multi-turn interactions. Extensive experiments across diverse benchmarks validate our approach, achieving state-of-the-art performance and highlighting the cross-domain potential of Tool RL for LLM reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes