CLJan 13

Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees

arXiv:2601.08274v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the challenge of augmenting LLMs with computational tools for complex reasoning tasks, representing a novel method for a known bottleneck rather than a foundational advancement.

The paper tackles the problem of integrating tool-use into long Chain-of-Thought reasoning for Large Language Models, which is underexplored due to data scarcity and integration challenges, and introduces DART, a reinforcement learning framework that enables spontaneous tool-use without human annotation, significantly outperforming existing methods on benchmarks like AIME and GPQA-Diamond.

Tool-Integrated Reasoning has emerged as a key paradigm to augment Large Language Models (LLMs) with computational capabilities, yet integrating tool-use into long Chain-of-Thought (long CoT) remains underexplored, largely due to the scarcity of training data and the challenge of integrating tool-use without compromising the model's intrinsic long-chain reasoning. In this paper, we introduce DART (Discovery And Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees), a reinforcement learning framework that enables spontaneous tool-use during long CoT reasoning without human annotation. DART operates by constructing dynamic rollout trees during training to discover valid tool-use opportunities, branching out at promising positions to explore diverse tool-integrated trajectories. Subsequently, a tree-based process advantage estimation identifies and credits specific sub-trajectories where tool invocation positively contributes to the solution, effectively reinforcing these beneficial behaviors. Extensive experiments on challenging benchmarks like AIME and GPQA-Diamond demonstrate that DART significantly outperforms existing methods, successfully harmonizing tool execution with long CoT reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes