SECRLGNov 27, 2024

Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs

arXiv:2411.18216v23 citationsh-index: 22Empir Softw Eng
Originality Incremental advance
AI Analysis

This work addresses the problem of generating reliable security attack detectors for software developers using LLMs, representing an incremental improvement through method integration.

The paper tackled the challenge of ensuring LLMs have sufficient knowledge to generate robust security attack detectors by integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline, resulting in significant performance improvements with up to 71%pt and 43%pt increases in F2-Score for XSS and SQLi detection, respectively.

Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements. A key challenge is ensuring the LLMs have enough knowledge to address specific security requirements, such as information about existing attacks. For this, we propose an approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline. RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired by the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector. Our extensive empirical study targets code generated by LLMs to detect two prevalent injection attacks in web security: Cross-Site Scripting (XSS) and SQL injection (SQLi). Results show a significant improvement in detection performance while employing RAG and Self-Ranking, with an increase of up to 71%pt (on average 37%pt) and up to 43%pt (on average 6%pt) in the F2-Score for XSS and SQLi detection, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes