RefleXGen:The unexamined code is not worth using
This addresses security challenges in AI-generated code for developers and users, representing a novel method for a known bottleneck rather than an incremental improvement.
This paper tackles the problem of security in code generation by large language models (LLMs) by introducing RefleXGen, a method that integrates Retrieval-Augmented Generation (RAG) with guided self-reflection mechanisms, resulting in substantial improvements in code security across multiple models, including a 13.6% improvement with GPT-3.5 Turbo and 6.7% with GPT-4o.
Security in code generation remains a pivotal challenge when applying large language models (LLMs). This paper introduces RefleXGen, an innovative method that significantly enhances code security by integrating Retrieval-Augmented Generation (RAG) techniques with guided self-reflection mechanisms inherent in LLMs. Unlike traditional approaches that rely on fine-tuning LLMs or developing specialized secure code datasets - processes that can be resource-intensive - RefleXGen iteratively optimizes the code generation process through self-assessment and reflection without the need for extensive resources. Within this framework, the model continuously accumulates and refines its knowledge base, thereby progressively improving the security of the generated code. Experimental results demonstrate that RefleXGen substantially enhances code security across multiple models, achieving a 13.6% improvement with GPT-3.5 Turbo, a 6.7% improvement with GPT-4o, a 4.5% improvement with CodeQwen, and a 5.8% improvement with Gemini. Our findings highlight that improving the quality of model self-reflection constitutes an effective and practical strategy for strengthening the security of AI-generated code.