CRAIMay 15, 2025

AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents

arXiv:2505.10321v111 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for cybersecurity vulnerability management, potentially reducing costs and increasing frequency.

The paper tackles automating penetration testing using LLM agents, presenting AutoPentest, which slightly outperforms manual ChatGPT-4o on Hack The Box tasks, both completing 15-25% of subtasks at a cost of $96.20.

A recent area of increasing research is the use of Large Language Models (LLMs) in penetration testing, which promises to reduce costs and thus allow for higher frequency. We conduct a review of related work, identifying best practices and common evaluation issues. We then present AutoPentest, an application for performing black-box penetration tests with a high degree of autonomy. AutoPentest is based on the LLM GPT-4o from OpenAI and the LLM agent framework LangChain. It can perform complex multi-step tasks, augmented by external tools and knowledge bases. We conduct a study on three capture-the-flag style Hack The Box (HTB) machines, comparing our implementation AutoPentest with the baseline approach of manually using the ChatGPT-4o user interface. Both approaches are able to complete 15-25 % of the subtasks on the HTB machines, with AutoPentest slightly outperforming ChatGPT. We measure a total cost of \$96.20 US when using AutoPentest across all experiments, while a one-month subscription to ChatGPT Plus costs \$20. The results show that further implementation efforts and the use of more powerful LLMs released in the future are likely to make this a viable part of vulnerability management.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes