SEAIFeb 11, 2025

Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

arXiv:2502.07728v15 citationsh-index: 1ICSOFT
Originality Incremental advance
AI Analysis

This addresses reliability concerns for software developers using LLM-generated code, though it's incremental as it builds on existing verification methods.

The paper tackles the problem of verifying correctness of LLM-generated code by using formal software verification with the SPARK framework for Ada, achieving correct annotations for 50.7% of benchmark cases.

Large language models (LLMs) have demonstrated remarkable code generation capabilities, but the correctness of the generated code cannot be inherently trusted. This paper explores the feasibility of using formal software verification, specifically the SPARK framework for Ada, to ensure the reliability of LLM-generated code. We present Marmaragan, a tool that leverages an LLM in order to generate SPARK annotations for existing programs, enabling formal verification of the code. The tool is benchmarked on a curated set of SPARK programs, with annotations selectively removed to test specific capabilities. The performance of Marmaragan with GPT-4o on the benchmark is promising, with correct annotations having been generated for 50.7% of the benchmark cases. The results establish a foundation for future work on combining the power of LLMs with the reliability of formal software verification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes