AI CL CRNov 21, 2023

How Far Have We Gone in Vulnerability Detection Using Large Language Models

Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, Chao Zhang

arXiv:2311.12420v320.351 citationsh-index: 7Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of improving software security through better vulnerability detection, though it is incremental as it benchmarks existing LLMs rather than proposing a new method.

The authors tackled the problem of automated vulnerability detection in software by evaluating the performance of large language models (LLMs) against traditional methods, finding that several LLMs outperform deep learning approaches and static analyzers on a new benchmark called VulBench.

As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.

View on arXiv PDF Code

Similar