LGMay 21, 2025

Procedural Environment Generation for Tool-Use Agents

arXiv:2506.11045v215 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the data curation bottleneck for training tool-use agents, which is an incremental but practical advancement for AI researchers and developers.

The paper tackles the problem of generating synthetic training data for LLM tool-use agents by introducing RandomWorld, a pipeline for procedural generation of interactive tools and compositional tool-use data. The result shows that models trained on this synthetic data improve on tool-use benchmarks and achieve new state-of-the-art on two metrics of the NESTFUL dataset.

Although the power of LLM tool-use agents has ignited a flurry of recent research in this area, the curation of tool-use training data remains an open problem$-$especially for online RL training. Existing approaches to synthetic tool-use data generation tend to be non-interactive, and/or non-compositional. We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data. We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks, and set the new SoTA for two metrics on the NESTFUL dataset. Further experiments show that downstream performance scales with the amount of RandomWorld-generated training data, opening up the possibility of further improvement through the use of entirely synthetic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes