CLJun 17, 2025

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

Berkeley
arXiv:2506.14205v121 citationsh-index: 24Has Code
Originality Incremental advance
AI Analysis

This addresses the need for efficient task generation in AI agent development, though it is incremental as it builds on existing methods for data synthesis.

The paper tackles the problem of generating scalable and cost-effective tasks for generalist computer-use agents by introducing AgentSynth, a pipeline that synthesizes over 6,000 diverse tasks, resulting in a steep performance drop for state-of-the-art LLM agents from 18% to 4% success across difficulty levels and an average cost of $0.60 per trajectory.

We introduce AgentSynth, a scalable and cost-efficient pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. Leveraging information asymmetry, AgentSynth constructs subtasks that are simple during generation but significantly more challenging when composed into long-horizon tasks, enabling the creation of over 6,000 diverse and realistic tasks. Our pipeline begins with an LLM-based task proposer guided by a persona, followed by an execution agent that completes the task and logs the trajectory. This process is repeated iteratively to form a sequence of subtasks, which are then summarized by a separate agent into a composite task of controllable difficulty. A key strength of AgentSynth is its ability to precisely modulate task complexity by varying the number of subtasks. Empirical evaluations show that state-of-the-art LLM agents suffer a steep performance drop, from 18% success at difficulty level 1 to just 4% at level 6, highlighting the benchmark's difficulty and discriminative power. Moreover, our pipeline achieves a low average cost of \$0.60 per trajectory, orders of magnitude cheaper than human annotations. Our code and data are publicly available at https://github.com/sunblaze-ucb/AgentSynth

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes