AI CL BMJul 1, 2025

STELLA: Self-Evolving LLM Agent for Biomedical Research

Ruofan Jin, Zaixi Zhang, Mengdi Wang, Le Cong

arXiv:2507.02004v111 citationsh-index: 9

Originality Highly original

AI Analysis

This addresses the challenge for biomedical researchers by enabling AI agents to adapt and scale dynamically, representing a significant advance beyond static systems.

The paper tackles the problem of fragmented biomedical research by introducing STELLA, a self-evolving AI agent that autonomously improves its capabilities, achieving state-of-the-art accuracy on biomedical benchmarks with scores like 26% on Humanity's Last Exam and outperforming leading models by up to 6 percentage points.

The rapid growth of biomedical data, tools, and literature has created a fragmented research landscape that outpaces human expertise. While AI agents offer a solution, they typically rely on static, manually curated toolsets, limiting their ability to adapt and scale. Here, we introduce STELLA, a self-evolving AI agent designed to overcome these limitations. STELLA employs a multi-agent architecture that autonomously improves its own capabilities through two core mechanisms: an evolving Template Library for reasoning strategies and a dynamic Tool Ocean that expands as a Tool Creation Agent automatically discovers and integrates new bioinformatics tools. This allows STELLA to learn from experience. We demonstrate that STELLA achieves state-of-the-art accuracy on a suite of biomedical benchmarks, scoring approximately 26\% on Humanity's Last Exam: Biomedicine, 54\% on LAB-Bench: DBQA, and 63\% on LAB-Bench: LitQA, outperforming leading models by up to 6 percentage points. More importantly, we show that its performance systematically improves with experience; for instance, its accuracy on the Humanity's Last Exam benchmark almost doubles with increased trials. STELLA represents a significant advance towards AI Agent systems that can learn and grow, dynamically scaling their expertise to accelerate the pace of biomedical discovery.

View on arXiv PDF

Similar