CRAIMay 24

APT-Agent: Automated Penetration Testing using Large Language Models

arXiv:2605.2494937.2
AI Analysis

For cybersecurity professionals, APT-Agent offers a more reliable and automated approach to penetration testing, reducing human cognitive burden.

APT-Agent is an LLM-driven penetration testing framework that achieves an 84.29% end-to-end exploitation success rate on Metasploitable 2, outperforming Script Kiddie (48.57%) and PentestGPT (18.57%) by addressing hallucination and memory issues.

Penetration testing is essential to securing modern web infrastructures, yet traditional manual methods struggle to keep pace with their scale and complexity. Large Language Models (LLMs) offer new opportunities for automating these tasks, but existing approaches face two persistent challenges: hallucination of technical entities and insufficient long-term contextual memory. To address these issues, we present APT-Agent, a fully automated LLM-driven penetration testing framework that systematically orchestrates reconnaissance, exploitation, and exfiltration. APT-Agent introduces a hybrid rectification module to recover hallucinated commands and a command-specific memory architecture to preserve operational context across multi-step attack sequences. We evaluate our APT-Agent on Metasploitable 2 against seven vulnerable services spanning web, database, and network protocols. APT-Agent achieves an 84.29% end-to-end exploitation success rate, compared to 48.57% (Script Kiddie) and 18.57% (PentestGPT) under matched conditions. By reducing cognitive burden and minimizing reliance on human intervention, APT-Agent represents a step toward scalable, reliable, and cognitively efficient automation for penetration testing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes