CRLGJan 30, 2025

Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

arXiv:2502.00064v17 citationsh-index: 3ICMLA
Originality Synthesis-oriented
AI Analysis

This addresses the problem of unreliable vulnerability detection in software security for developers and researchers, but it is incremental as it focuses on evaluating existing models rather than proposing new methods.

This study evaluated how tokenized Java code length affects the accuracy and explicitness of ten large language models in vulnerability detection, finding inconsistencies with some models like GPT-4 showing robustness while others had performance linked to length.

This study examines the impact of tokenized Java code length on the accuracy and explicitness of ten major LLMs in vulnerability detection. Using chi-square tests and known ground truth, we found inconsistencies across models: some, like GPT-4, Mistral, and Mixtral, showed robustness, while others exhibited a significant link between tokenized length and performance. We recommend future LLM development focus on minimizing the influence of input length for better vulnerability detection. Additionally, preprocessing techniques that reduce token count while preserving code structure could enhance LLM accuracy and explicitness in these tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes