SEAIPLNov 5, 2024

dafny-annotator: AI-Assisted Verification of Dafny Programs

arXiv:2411.15143v112 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the problem of high verification costs for Dafny users, offering an incremental improvement through AI-assisted annotation.

The paper tackles the challenge of reducing the cost of formal verification in Dafny programs by developing dafny-annotator, a tool that uses LLMs and search to add logical annotations, achieving a success rate of 50.6% on a test set after fine-tuning on augmented datasets.

Formal verification has the potential to drastically reduce software bugs, but its high additional cost has hindered large-scale adoption. While Dafny presents a promise to significantly reduce the effort to write verified programs, users are often required to provide logical annotations to aid the verifier. Here, we explore using a combination of Large Language Models and search to build dafny-annotator: a tool that adds logical annotations to a Dafny method until the verifier can prove it correct. On a test set from the DafnyBench collection of programs, greedy search guided by LLaMa 3.1 8B successfully annotates only 15.7% of the methods. Since this data-driven approach is hindered by the lack of large-scale training data, we propose a method for open-ended synthesis of new Dafny programs in a flexible pipeline where LLMs formulate high-level ideas, implement them, and incrementally propose changes to existing programs, which Dafny validates. This gives us a synthetic dataset, DafnySynth, which we use to augment DafnyBench for training. Fine-tuning on both datasets boosts LLaMa 8B's success rate to 50.6% -- significantly better than the base model, or training on either dataset alone. Our results suggest a path towards capable AI assistants for languages that don't yet have large-scale human-generated examples. In turn, such assistants might reduce friction for users and ultimately drive adoption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes