SEAINov 9, 2025

LLM For Loop Invariant Generation and Fixing: How Far Are We?

arXiv:2511.06552v12 citationsh-index: 5Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of automated program safety assessment for software engineers and formal verification practitioners, but it is incremental as it benchmarks existing LLMs on a new task.

The study evaluated the performance of Large Language Models (LLMs) in generating and fixing loop invariants for program verification, finding that LLMs achieved a maximum success rate of 78% in generation but only 16% in repair, with performance improving when supplemented with auxiliary information like domain knowledge.

A loop invariant is a property of a loop that remains true before and after each execution of the loop. The identification of loop invariants is a critical step to support automated program safety assessment. Recent advancements in Large Language Models (LLMs) have demonstrated potential in diverse software engineering (SE) and formal verification tasks. However, we are not aware of the performance of LLMs to infer loop invariants. We report an empirical study of both open-source and closed-source LLMs of varying sizes to assess their proficiency in inferring inductive loop invariants for programs and in fixing incorrect invariants. Our findings reveal that while LLMs exhibit some utility in inferring and repairing loop invariants, their performance is substantially enhanced when supplemented with auxiliary information such as domain knowledge and illustrative examples. LLMs achieve a maximum success rate of 78\% in generating, but are limited to 16\% in repairing the invariant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes