CRLGSESep 23, 2025

LLM-based Vulnerability Discovery through the Lens of Code Metrics

arXiv:2509.19117v17 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work identifies a critical limitation in applying LLMs to software vulnerability discovery, showing they operate at a shallow level similar to basic code metrics, which is an incremental insight for researchers and practitioners in software engineering and AI.

The study investigated why large language models (LLMs) have stalled in vulnerability discovery by analyzing them through classic code metrics, finding that a simple classifier trained on these metrics performs on par with state-of-the-art LLMs, revealing a strong correlation and causal effect between LLMs and metrics.

Large language models (LLMs) excel in many tasks of software engineering, yet progress in leveraging them for vulnerability discovery has stalled in recent years. To understand this phenomenon, we investigate LLMs through the lens of classic code metrics. Surprisingly, we find that a classifier trained solely on these metrics performs on par with state-of-the-art LLMs for vulnerability discovery. A root-cause analysis reveals a strong correlation and a causal effect between LLMs and code metrics: When the value of a metric is changed, LLM predictions tend to shift by a corresponding magnitude. This dependency suggests that LLMs operate at a similarly shallow level as code metrics, limiting their ability to grasp complex patterns and fully realize their potential in vulnerability discovery. Based on these findings, we derive recommendations on how research should more effectively address this challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes