AICLPLJan 27, 2025

From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

arXiv:2501.16207v430 citationsh-index: 49Has CodeACL
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of formal verification for software and system developers, offering an incremental improvement by applying existing methods to new data in formal reasoning.

The paper tackles the problem of converting natural language requirements into verifiable formal proofs by constructing 18,000 instruction-response pairs across five formal specification languages and evaluating ten LLMs, including fine-tuning small 7-8B models to achieve performance comparable to a 671B model.

The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We also fine-tuned several 7~8B small models to achieve comparable performance with Deepseek-R1-671B. Interestingly, we observed that fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities. Fine-tuned models are released at https: //huggingface.co/fm-universe.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes