SEAIPLApr 17

Certified Program Synthesis with a Multi-Modal Verifier

arXiv:2604.1658417.81 citationsh-index: 5
Predicted impact top 27% in SE · last 90 daysOriginality Incremental advance
AI Analysis

For researchers and practitioners in program synthesis and formal verification, this work addresses the dual bottlenecks of specification validation and verifier fragmentation, enabling more robust and automated certified code generation.

Certified program synthesis (vericoding) from natural language is challenging due to weak/strong specifications and fragmented verifiers. The authors propose LeetProof, an agentic pipeline using a multi-modal verifier (Velvet in Lean) that combines dynamic testing, automated proofs, and interactive scripting, achieving significantly higher certified solution rates than single-mode baselines across two LLM backends.

Certified program synthesis (aka vericoding) is the process of automatically generating a program, its formal specification, and a machine-checkable proof of their alignment from a natural-language description. Two challenges make vericoding difficult. First, specifications synthesised from natural language are often either too weak to be meaningful or too strong to be implementable, yet existing approaches lack systematic means to detect such defects. Second, the landscape of program verifiers is fragmented: each tool supports a particular reasoning mode -- auto-active (e.g., Dafny, Verus) or interactive (e.g., Coq, Lean) -- with its own trade-off between automation and expressivity. This forces every synthesis methodology to be tailored to a single verification paradigm, limiting the class of tasks it can handle effectively. We overcome both challenges by structuring the certified synthesis workflow around a multi-modal verifier -- a single tool combining dynamic validation, automated proofs, and interactive proof scripting in one foundational framework. We realise this idea in LeetProof, an agentic pipeline built on Velvet, a multi-modal verifier embedded in Lean. Multi-modality enables LeetProof to validate generated specifications via randomised property-based testing before any code is synthesised, decompose the synthesis task into sub-problems guided by verification conditions, and delegate residual proof obligations to frontier AI provers specialised for Lean. We evaluate LeetProof on benchmarks derived from prior work on certified synthesis. Our specification validation uncovers defects in existing reference benchmarks, and LeetProof's staged pipeline achieves a significantly higher rate of fully certified solutions than a single-mode baseline at the same budget -- consistently across two frontier LLM backends.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes