SE AIJul 23, 2024

Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection

Xin Zhou, Duc-Manh Tran, Thanh Le-Cong, Ting Zhang, Ivana Clairine Irsan, Joshua Sumarlin, Bach Le, David Lo

arXiv:2407.16235v116.843 citationsh-index: 13Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of automated vulnerability detection for software security practitioners by providing a comparative analysis, though it is incremental as it evaluates existing methods rather than introducing new ones.

The paper compared 15 SAST tools and 12 LLMs for detecting vulnerabilities in Java, C, and Python repositories, finding that SAST tools had low detection rates with few false positives while LLMs detected 90-100% of vulnerabilities but with high false positives, and ensembling both mitigated these drawbacks.

Software vulnerabilities pose significant security challenges and potential risks to society, necessitating extensive efforts in automated vulnerability detection. There are two popular lines of work to address automated vulnerability detection. On one hand, Static Application Security Testing (SAST) is usually utilized to scan source code for security vulnerabilities, especially in industries. On the other hand, deep learning (DL)-based methods, especially since the introduction of large language models (LLMs), have demonstrated their potential in software vulnerability detection. However, there is no comparative study between SAST tools and LLMs, aiming to determine their effectiveness in vulnerability detection, understand the pros and cons of both SAST and LLMs, and explore the potential combination of these two families of approaches. In this paper, we compared 15 diverse SAST tools with 12 popular or state-of-the-art open-source LLMs in detecting software vulnerabilities from repositories of three popular programming languages: Java, C, and Python. The experimental results showed that SAST tools obtain low vulnerability detection rates with relatively low false positives, while LLMs can detect up 90\% to 100\% of vulnerabilities but suffer from high false positives. By further ensembling the SAST tools and LLMs, the drawbacks of both SAST tools and LLMs can be mitigated to some extent. Our analysis sheds light on both the current progress and future directions for software vulnerability detection.

View on arXiv PDF

Similar