CR AI CL SEDec 16, 2024

Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection

Ira Ceka, Feitong Qiao, Anik Dey, Aastha Valecha, Gail Kaiser, Baishakhi Ray

arXiv:2412.12039v38.56 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the high error rates and labor-intensive nature of static analysis for developers and security experts, though it appears incremental as it builds on existing LLM methods.

The study tackled the problem of vulnerability detection in code by investigating whether LLM prompting can replace static analysis tools, finding that their prompting strategies improved accuracy by up to 31.6% and reduced false negative rates by up to 37.6%.

Despite their remarkable success, large language models (LLMs) have shown limited ability on safety-critical code tasks such as vulnerability detection. Typically, static analysis (SA) tools, like CodeQL, CodeGuru Security, etc., are used for vulnerability detection. SA relies on predefined, manually-crafted rules for flagging various vulnerabilities. Thus, effectiveness of SA in detecting vulnerabilities depends on human experts and is known to report high error rates. In this study we investigate whether LLM prompting can be an effective alternative to these static analyzers in the partial code setting. We propose prompting strategies that integrate natural language instructions of vulnerabilities with contrastive chain-of-thought reasoning, augmented using contrastive samples from a synthetic dataset. Our findings demonstrate that security-aware prompting techniques can be effective alternatives to the laborious, hand-crafted rules of static analyzers, which often result in high false negative rates in the partial code setting. When leveraging SOTA reasoning models such as DeepSeek-R1, each of our prompting strategies exceeds the static analyzer baseline, with the best strategies improving accuracy by as much as 31.6%, F1-scores by 71.7%, pairwise accuracies by 60.4%, and reducing FNR by as much as 37.6%.

View on arXiv PDF

Similar