SECRLGMay 13

Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

arXiv:2605.1313849.7
Predicted impact top 51% in SE · last 90 daysOriginality Synthesis-oriented
AI Analysis

For security practitioners relying on automated patch detection, the study reveals fundamental limitations of current code-centric approaches, highlighting the need for alternative methods.

The paper evaluates code language models for detecting vulnerability-fixing commits, finding that models fail to learn transferable security knowledge from code alone, with code-only models missing over 93% of vulnerabilities at a 0.5% false positive rate.

Automated detection of vulnerability-fixing commits (VFCs) is critical for timely security patch deployment, as advisory databases lag patch releases by a median of 25 days and many fixes never receive advisories. We present a comprehensive evaluation of code language model based VFC detection through a unified framework consolidating over 20 fragmented datasets spanning more than 180000 commits. Across over 180 experiments with fine-tuned models from 125 M to 14 B parameters, we find no evidence that models acquire transferable security-relevant code understanding from code changes alone. When commit messages are available, they dominate model attention, and when removed, an attribution analysis shows that enriching diffs with additional intra-procedural semantic context does not shift model attention toward the code changes. Group-stratified evaluation exposes approximately 17% performance drops compared to random splits, while temporal splits on aggregated datasets prove unreliable due to compositional shift in the underlying project distributions. At a false positive rate of 0.5% all fine-tuned code-only models miss over 93% of vulnerabilities. Larger and more diverse training data or generative approaches show preliminary improvements but do not resolve the underlying limitations. To support future research on code-centric VFC detection, we release our unified framework and evaluation suite.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes