LG AI CR SEMar 10, 2025

Evaluating LLaMA 3.2 for Software Vulnerability Detection

José Gonçalves, Miguel Silva, Bernardo Cabral, Tiago Dias, Eva Maia, Isabel Praça, Ricardo Severino, Luís Lino Ferreira

arXiv:2503.07770v113.08 citationsh-index: 29EICC

Originality Synthesis-oriented

AI Analysis

This work addresses data quality issues for researchers and practitioners in cybersecurity, but it is incremental as it builds on an existing dataset and model.

The authors tackled the challenge of limited real-world data for deep learning in software vulnerability detection by refining the DiverseVul dataset and fine-tuning LLaMA 3.2, resulting in an F1-Score improvement from 47% to 66%.

Deep Learning (DL) has emerged as a powerful tool for vulnerability detection, often outperforming traditional solutions. However, developing effective DL models requires large amounts of real-world data, which can be difficult to obtain in sufficient quantities. To address this challenge, DiverseVul dataset has been curated as the largest dataset of vulnerable and non-vulnerable C/C++ functions extracted exclusively from real-world projects. Its goal is to provide high-quality, large-scale samples for training DL models. However, during our study several inconsistencies were identified in the raw dataset while applying pre-processing techniques, highlighting the need for a refined version. In this work, we present a refined version of DiverseVul dataset, which is used to fine-tune a large language model, LLaMA 3.2, for vulnerability detection. Experimental results show that the use of pre-processing techniques led to an improvement in performance, with the model achieving an F1-Score of 66%, a competitive result when compared to our baseline, which achieved a 47% F1-Score in software vulnerability detection.

View on arXiv PDF

Similar