CRAIMay 25

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

arXiv:2605.2619583.4Has Code
Predicted impact top 10% in CR · last 90 daysOriginality Incremental advance
AI Analysis

For cybersecurity practitioners, this work provides a method to automatically adapt LLM agents to diverse security testing tasks without manual scaffold design, though the gains are incremental over existing approaches.

CyberEvolver introduces a self-evolving framework for LLM-based cybersecurity agents that iteratively revises its scaffold based on failed execution attempts, improving success rate by 13.6% on average across CTF challenges, vulnerability exploitation, and penetration-testing tasks, outperforming six human-designed agents and two self-improvement methods.

LLM-based agents are increasingly used for cybersecurity tasks, but most existing systems rely on fixed, human-designed scaffolds that struggle to adapt across diverse targets and failure modes. We introduce \textsc{CyberEvolver}, a self-evolving cybersecurity agent framework that iteratively revises its own scaffold based on experience from failed execution attempts. Self-evolution in cybersecurity is challenging because the space of possible scaffold changes is largely unstructured, execution feedback is sparse and often obscured by the environment, and low-diversity updates can cause errors to compound over repeated iterations. \textsc{CyberEvolver} addresses these challenges with a four-layer evolvable agent architecture that decomposes scaffold optimization into structured components, a trace-to-diagnosis mechanism that converts noisy execution logs into actionable revision signals, and a population-based beam search strategy that preserves diverse agent variants during evolution. We evaluate \textsc{CyberEvolver} on CTF challenges, vulnerability exploitation, and penetration-testing tasks using four open-source LLMs. Across these settings, \textsc{CyberEvolver} improves the seed agent's success rate by $13.6$\,\% on average, and outperforms six human-designed cybersecurity agents as well as two self-improvement methods adapted from other domains. These results suggest that scaffold self-evolution is a promising direction for building adaptive LLM agents for security testing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes