CRAIApr 29, 2025

SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories

arXiv:2504.21205v24 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the need for better benchmarks to assess secure code generation in real-world settings, though it is incremental as it builds on prior benchmarking efforts.

The paper tackles the problem of evaluating code agents for secure code completion in real-world repositories by introducing SecRepoBench, a benchmark with 318 tasks across 27 C/C++ repositories covering 15 CWEs, and finds that code agents significantly outperform standalone LLMs, which struggle with generating correct and secure completions.

This paper introduces SecRepoBench, a benchmark to evaluate code agents on secure code completion in real-world repositories. SecRepoBench has 318 code completion tasks in 27 C/C++ repositories, covering 15 CWEs. We evaluate 28 standalone LLMs and 13 code agents across 3 state-of-the-art agent frameworks using our benchmark. We find that state-of-the-art LLMs struggle with generating correct and secure code completions. However, code agents significantly outperform standalone LLMs. We show that SecRepoBench is more difficult than the prior state-of-the-art benchmark. Finally, our comprehensive analysis provides insights into potential directions for enhancing the ability of code agents to write correct and secure code in real-world repositories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes