CRAIJan 12, 2025

Generative Artificial Intelligence-Supported Pentesting: A Comparison between Claude Opus, GPT-4, and Copilot

arXiv:2501.06963v24 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses cybersecurity professionals by evaluating AI tools for pentesting, but it is incremental as it compares existing models without introducing new methods.

The study compared three generative AI tools (Claude Opus, GPT-4, Copilot) in supporting penetration testing based on the PTES framework, finding that while they cannot fully automate the process, they enhance efficiency and effectiveness, with Claude Opus performing best in experiments.

The advent of Generative Artificial Intelligence (GenAI) has brought a significant change to our society. GenAI can be applied across numerous fields, with particular relevance in cybersecurity. Among the various areas of application, its use in penetration testing (pentesting) or ethical hacking processes is of special interest. In this paper, we have analyzed the potential of leading generic-purpose GenAI tools-Claude Opus, GPT-4 from ChatGPT, and Copilot-in augmenting the penetration testing process as defined by the Penetration Testing Execution Standard (PTES). Our analysis involved evaluating each tool across all PTES phases within a controlled virtualized environment. The findings reveal that, while these tools cannot fully automate the pentesting process, they provide substantial support by enhancing efficiency and effectiveness in specific tasks. Notably, all tools demonstrated utility; however, Claude Opus consistently outperformed the others in our experimental scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes