SE AIJun 17, 2024

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Xueying Du, Geng Zheng, Kaixin Wang, Yi Zou, Yujia Wang, Wentai Deng, Jiayi Feng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

arXiv:2406.11147v325.6129 citations

Originality Incremental advance

AI Analysis

This addresses a critical limitation in automated software security for developers and security analysts, though it is an incremental improvement on existing RAG methods.

The paper tackles the problem of LLMs struggling to distinguish vulnerable from patched code in vulnerability detection, achieving a 16-24% accuracy improvement and detecting 10 previously-unknown bugs in the Linux kernel.

Although LLMs have shown promising potential in vulnerability detection, this study reveals their limitations in distinguishing between vulnerable and similar-but-benign patched code (only 0.06 - 0.14 accuracy). It shows that LLMs struggle to capture the root causes of vulnerabilities during vulnerability detection. To address this challenge, we propose enhancing LLMs with multi-dimensional vulnerability knowledge distilled from historical vulnerabilities and fixes. We design a novel knowledge-level Retrieval-Augmented Generation framework Vul-RAG, which improves LLMs with an accuracy increase of 16% - 24% in identifying vulnerable and patched code. Additionally, vulnerability knowledge generated by Vul-RAG can further (1) serve as high-quality explanations to improve manual detection accuracy (from 60% to 77%), and (2) detect 10 previously-unknown bugs in the recent Linux kernel release with 6 assigned CVEs.

View on arXiv PDF

Similar