52.6SEJun 3Code
Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight ModelsSabrina Kaniewski, Fabian Schmidt, Tobias Heer
Large language models (LLMs) have shown strong potential for automated software vulnerability detection, particularly in retrieval-augmented generation (RAG) settings. However, for approaches relying on proprietary models and APIs, reproducibility and replicability remain largely unexplored, raising the question of whether reported results generalize or depend primarily on specific model choices. In this work, we present a reproducibility study of Vul-RAG, a RAG-based framework for source code vulnerability detection that enhances LLMs with high-level vulnerability knowledge. We first replicate the results in a fully local and open-weights setting using the reported open-weight baseline models. We then extend the evaluation to a diverse set of recent open-weight LLMs, including code-specialized, general-purpose, and reasoning models of varying parameter sizes. The results confirm that the findings of Vul-RAG are reproducible under local deployment, but with minor deviations. Across all evaluated models, we observe a performance plateau at approximately 0.30 pairwise accuracy (code pairs for which both the vulnerable and the patched function are correctly classified). Notably, this plateau persists even for more recent and advanced models, indicating that improvements in model capacity alone do not substantially enhance performance. Finally, we discuss practical implications and trade-offs between detection effectiveness, model capabilities, and model scale. Implementation and evaluation artifacts are publicly available at https://github.com/hs-esslingen-it-security/revisiting-Vul-RAG.
CRJan 30Code
Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise NetworksLukas Köder, Nils Lohmiller, Phil Schmieder et al.
The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication. To mitigate this threat, the adoption of Post-Quantum Cryptography (PQC) is critical. In this work, we investigate the performance impact of PQC algorithms on WPA-Enterprise-based authentication. To this end, we conduct an experimental evaluation of authentication latency using a testbed built with the open-source tools FreeRADIUS and hostapd, measuring the time spent at the client, access point, and RADIUS server. We evaluate multiple combinations of PQC algorithms and analyze their performance overhead in comparison to currently deployed cryptographic schemes. Beyond performance, we assess the security implications of these algorithm choices by relating authentication mechanisms to the quantum effort required for their exploitation. This perspective enables a systematic categorization of PQ-relevant weaknesses in WPA-Enterprise according to their practical urgency. The evaluation results show that, although PQC introduces additional authentication latency, combinations such as ML-DSA-65 and Falcon-1024 used in conjunction with ML-KEM provide a favorable trade-off between security and performance. Furthermore, we demonstrate that the resulting overhead can be effectively mitigated through session resumption. Overall, this work presents a first real-world performance evaluation of PQC-enabled WPA-Enterprise authentication and demonstrates its practical feasibility for enterprise Wi-Fi deployments.
SEJul 30, 2025
A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language ModelsSabrina Kaniewski, Fabian Schmidt, Markus Enzweiler et al.
The increasing adoption of Large Language Models (LLMs) in software engineering has sparked interest in their use for software vulnerability detection. However, the rapid development of this field has resulted in a fragmented research landscape, with diverse studies that are difficult to compare due to differences in, e.g., system designs and dataset usage. This fragmentation makes it difficult to obtain a clear overview of the state-of-the-art or compare and categorize studies meaningfully. In this work, we present a comprehensive systematic literature review (SLR) of LLM-based software vulnerability detection. We analyze 227 studies published between January 2020 and June 2025, categorizing them by task formulation, input representation, system architecture, and adaptation techniques. Further, we analyze the datasets used, including their characteristics, vulnerability coverage, and diversity. We present a fine-grained taxonomy of vulnerability detection approaches, identify key limitations, and outline actionable future research opportunities. By providing a structured overview of the field, this review improves transparency and serves as a practical guide for researchers and practitioners aiming to conduct more comparable and reproducible research. We publicly release all artifacts and maintain a living repository of LLM-based software vulnerability detection studies.