AIFeb 4

Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

arXiv:2602.04837v110 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses the problem of limited autonomous improvement in AI agents for researchers and practitioners, offering a novel paradigm rather than an incremental change.

The paper tackles the problem of inefficient utilization of exploratory diversity in open-ended self-improving agents by introducing Group-Evolving Agents (GEA), which treats a group of agents as the evolutionary unit to enable experience sharing. The result is significant performance improvements over state-of-the-art self-evolving methods (e.g., 71.0% vs. 56.7% on SWE-bench Verified) and competitive results with human-designed frameworks.

Open-ended self-improving agents can autonomously modify their own structural designs to advance their capabilities and overcome the limits of pre-defined architectures, thus reducing reliance on human intervention. We introduce Group-Evolving Agents (GEA), a new paradigm for open-ended self-improvements, which treats a group of agents as the fundamental evolutionary unit, enabling explicit experience sharing and reuse within the group throughout evolution. Unlike existing open-ended self-evolving paradigms that adopt tree-structured evolution, GEA overcomes the limitation of inefficient utilization of exploratory diversity caused by isolated evolutionary branches. We evaluate GEA on challenging coding benchmarks, where it significantly outperforms state-of-the-art self-evolving methods (71.0% vs. 56.7% on SWE-bench Verified, 88.3% vs. 68.3% on Polyglot) and matches or exceeds top human-designed agent frameworks (71.8% and 52.0% on two benchmarks, respectively). Analysis reveals that GEA more effectively converts early-stage exploratory diversity into sustained, long-term progress, achieving stronger performance under the same number of evolved agents. Furthermore, GEA exhibits consistent transferability across different coding models and greater robustness, fixing framework-level bugs in 1.4 iterations on average, versus 5 for self-evolving methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes