AILGOct 30, 2025

Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

arXiv:2510.26094v15 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the need for formal verification tools in physics education and research, though it is incremental as it builds on existing Lean4 and benchmark methodologies.

The authors tackled the problem of formal reasoning in college-level physics by creating Lean4PHYS, a framework including a benchmark (LeanPhysBench) and a foundational library (PhysLib), with baseline results showing best model performance at 35% and an 11.75% average improvement from PhysLib.

We present **Lean4PHYS**, a comprehensive reasoning framework for college-level physics problems in Lean4. **Lean4PHYS** includes *LeanPhysBench*, a college-level benchmark for formal physics reasoning in Lean4, which contains 200 hand-crafted and peer-reviewed statements derived from university textbooks and physics competition problems. To establish a solid foundation for formal reasoning in physics, we also introduce *PhysLib*, a community-driven repository containing fundamental unit systems and theorems essential for formal physics reasoning. Based on the benchmark and Lean4 repository we composed in **Lean4PHYS**, we report baseline results using major expert Math Lean4 provers and state-of-the-art closed-source models, with the best performance of DeepSeek-Prover-V2-7B achieving only 16% and Claude-Sonnet-4 achieving 35%. We also conduct a detailed analysis showing that our *PhysLib* can achieve an average improvement of 11.75% in model performance. This demonstrates the challenging nature of our *LeanPhysBench* and the effectiveness of *PhysLib*. To the best of our knowledge, this is the first study to provide a physics benchmark in Lean4.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes