CR AIJun 20, 2025

Towards Effective Complementary Security Analysis using Large Language Models

Jonas Wagner, Simon Müller, Christian Näther, Jan-Philipp Steghöfer, Andreas Both

arXiv:2506.16899v21 citationsh-index: 22ISI

Originality Incremental advance

AI Analysis

This addresses the challenge of manual security analysis for software developers by automating false positive reduction, though it is incremental as it builds on existing LLM techniques.

The paper tackled the problem of reducing false positives in static application security testing (SAST) reports by using Large Language Models (LLMs) for assessment, achieving up to 78.9% false positive detection on benchmark data and 38.46% on real-world data without missing genuine weaknesses.

A key challenge in security analysis is the manual evaluation of potential security weaknesses generated by static application security testing (SAST) tools. Numerous false positives (FPs) in these reports reduce the effectiveness of security analysis. We propose using Large Language Models (LLMs) to improve the assessment of SAST findings. We investigate the ability of LLMs to reduce FPs while trying to maintain a perfect true positive rate, using datasets extracted from the OWASP Benchmark (v1.2) and a real-world software project. Our results indicate that advanced prompting techniques, such as Chain-of-Thought and Self-Consistency, substantially improve FP detection. Notably, some LLMs identified approximately 62.5% of FPs in the OWASP Benchmark dataset without missing genuine weaknesses. Combining detections from different LLMs would increase this FP detection to approximately 78.9%. Additionally, we demonstrate our approach's generalizability using a real-world dataset covering five SAST tools, three programming languages, and infrastructure files. The best LLM detected 33.85% of all FPs without missing genuine weaknesses, while combining detections from different LLMs would increase this detection to 38.46%. Our findings highlight the potential of LLMs to complement traditional SAST tools, enhancing automation and reducing resources spent addressing false alarms.

View on arXiv PDF

Similar