SEMay 9

Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set

Nongyu Di, Tianyu Chen, Shan Lu, Shuai Lu, Yeyun Gong, Peng Cheng, Jacob R. Lorch, Yuan Yao, Xiaoxing Ma

arXiv:2602.0491090.91 citationsh-index: 6

AI Analysis

For developers using formal verification in Rust, this work reduces the data scarcity bottleneck for proof synthesis, achieving large-scale data generation and improved model performance.

VeruSyn synthesizes 6.9 million Verus-verified Rust programs with proofs, enabling a fine-tuned 32B model that outperforms o4-mini and prior research models while offering better cost-proof tradeoffs than Claude Sonnet 4.5.

Large Language Models (LLMs) are widely used for code generation. However, the correctness of code generated by LLMs remains a concern. A potential remedy to this concern is to have LLMs generate formal correctness proofs along with such code. However, compared with code generation, code-proof generation requires much higher reasoning capability and has much less existing data to learn from. In this paper, we present VeruSyn, a data synthesis pipeline for Verus, a state-of-the-art verification tool for system software written in Rust. Through self-synthesis and tutorial-based synthesis, VeruSyn achieves much larger scale and Verus-feature coverage than previous data-synthesis techniques designed for Verus; VeruSyn also supplements its dataset with long-chain-of-thought (CoT) data through agent trajectory synthesis. With VeruSyn, we synthesize the largest set of Verus verified programs: 6.9 million Rust programs, each with a formal specification and a proof that it meets that specification. This dataset lets us create a fine-tuned Qwen2.5-Coder-32B-Instruct model with appealing cost-proof tradeoff compared with state-of-the-art commercial models like Claude Sonnet 4.5. It also significantly outperforms models like o4-mini and previously proposed research models.

View on arXiv PDF

Similar