CR LGJan 30, 2025

Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

arXiv:2502.00064v18.67 citationsh-index: 3ICMLA

Originality Synthesis-oriented

AI Analysis

This addresses the problem of unreliable vulnerability detection in software security for developers and researchers, but it is incremental as it focuses on evaluating existing models rather than proposing new methods.

This study evaluated how tokenized Java code length affects the accuracy and explicitness of ten large language models in vulnerability detection, finding inconsistencies with some models like GPT-4 showing robustness while others had performance linked to length.

This study examines the impact of tokenized Java code length on the accuracy and explicitness of ten major LLMs in vulnerability detection. Using chi-square tests and known ground truth, we found inconsistencies across models: some, like GPT-4, Mistral, and Mixtral, showed robustness, while others exhibited a significant link between tokenized length and performance. We recommend future LLM development focus on minimizing the influence of input length for better vulnerability detection. Additionally, preprocessing techniques that reduce token count while preserving code structure could enhance LLM accuracy and explicitness in these tasks.

View on arXiv PDF

Similar