CLOct 8, 2025

Adaptive Tool Generation with Models as Tools and Reinforcement Learning

arXiv:2510.06825v2
Originality Incremental advance
AI Analysis

This addresses scalability and reliability problems for developers and researchers using tool-augmented language models, representing an incremental improvement by replacing live APIs with simulated training.

The paper tackles the scalability and reliability issues of tool-augmented language models by proposing MTR, a simulation-first training framework that learns from structured traces without live API access, achieving competitive Exact Match scores on multi-hop QA benchmarks like HotpotQA and excelling on reasoning-intensive tasks.

Tool-augmented language models have demonstrated strong capabilities, but their reliance on live API access creates scalability and reliability challenges during training and deployment. We propose MTR, a simulation-first training framework for tool-augmented reasoning. Instead of relying on live APIs, MTR learns from complete ReAct traces with schema-validated, simulated observations. Our approach operates through a multi-agent architecture where a ToolMaker generates task-specific, OpenAI-compatible tool interfaces, an AutoAgent produces structured think-act-observe sequences, and a ToolActor simulates realistic responses. Training proceeds in two stages: Stage-1 Supervised Fine-Tuning (SFT) teaches 'trace grammar' from complete reasoning sequences; Stage-2 Group Relative Policy Optimization (GRPO) optimizes strategy with a composite trace reward that balances answer correctness and internal consistency. Across four multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA, Bamboogle), MTR attains competitive Exact Match (EM) scores to live-API systems and excels on reasoning-intensive tasks, suggesting that effective tool reasoning can be learned from structured traces without live interactions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes