CRAIJun 20, 2025

Towards Effective Complementary Security Analysis using Large Language Models

arXiv:2506.16899v21 citationsh-index: 22ISI
Originality Incremental advance
AI Analysis

This addresses the challenge of manual security analysis for software developers by automating false positive reduction, though it is incremental as it builds on existing LLM techniques.

The paper tackled the problem of reducing false positives in static application security testing (SAST) reports by using Large Language Models (LLMs) for assessment, achieving up to 78.9% false positive detection on benchmark data and 38.46% on real-world data without missing genuine weaknesses.

A key challenge in security analysis is the manual evaluation of potential security weaknesses generated by static application security testing (SAST) tools. Numerous false positives (FPs) in these reports reduce the effectiveness of security analysis. We propose using Large Language Models (LLMs) to improve the assessment of SAST findings. We investigate the ability of LLMs to reduce FPs while trying to maintain a perfect true positive rate, using datasets extracted from the OWASP Benchmark (v1.2) and a real-world software project. Our results indicate that advanced prompting techniques, such as Chain-of-Thought and Self-Consistency, substantially improve FP detection. Notably, some LLMs identified approximately 62.5% of FPs in the OWASP Benchmark dataset without missing genuine weaknesses. Combining detections from different LLMs would increase this FP detection to approximately 78.9%. Additionally, we demonstrate our approach's generalizability using a real-world dataset covering five SAST tools, three programming languages, and infrastructure files. The best LLM detected 33.85% of all FPs without missing genuine weaknesses, while combining detections from different LLMs would increase this detection to 38.46%. Our findings highlight the potential of LLMs to complement traditional SAST tools, enhancing automation and reducing resources spent addressing false alarms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes