AIOct 27, 2025

Alita-G: Self-Evolving Generative Agent for Agent Generation

Jiahao Qiu, Xuan Qi, Hongru Wang, Xinzhe Juan, Yimin Wang, Zelin Zhao, Jiayi Geng, Jiacheng Guo, Peihang Li, Jingzhe Shi, Shilong Liu, Mengdi Wang

arXiv:2510.23601v17 citationsh-index: 9

Originality Highly original

AI Analysis

This work addresses the need for efficient and accurate domain-specific agents in complex reasoning tasks, representing a novel method rather than an incremental improvement.

The paper tackles the problem of transforming general-purpose agents into domain experts by introducing ALITA-G, a self-evolution framework that generates, abstracts, and curates Model Context Protocol tools, achieving state-of-the-art results such as 83.03% pass@1 on GAIA validation while reducing computation costs by about 15%.

Large language models (LLMs) have been shown to perform better when scaffolded into agents with memory, tools, and feedback. Beyond this, self-evolving agents have emerged, but current work largely limits adaptation to prompt rewriting or failure retries. Therefore, we present ALITA-G, a self-evolution framework that transforms a general-purpose agent into a domain expert by systematically generating, abstracting, and curating Model Context Protocol (MCP) tools. In this framework, a generalist agent executes a curated suite of target-domain tasks and synthesizes candidate MCPs from successful trajectories. These are then abstracted to parameterized primitives and consolidated into an MCP Box. At inference time, ALITA-G performs retrieval-augmented MCP selection with the help of each tool's descriptions and use cases, before executing an agent equipped with the MCP Executor. Across several benchmarks GAIA, PathVQA, and Humanity's Last Exam, ALITA-G attains strong gains while reducing computation costs. On GAIA validation, it achieves 83.03% pass@1 and 89.09% pass@3, establishing a new state-of-the-art result while reducing mean tokens per example by approximately 15% relative to a strong baseline agent. ALITA-G thus provides a principled pathway from generalist capability to reusable, domain-specific competence, improving both accuracy and efficiency on complex reasoning tasks.

View on arXiv PDF

Similar