Xinglang Zhang, Yunyao Zhang, ZeLiang Chen et al.
For researchers and practitioners using LLMs for logical reasoning, this work reveals a fundamental limitation and offers a method to improve robustness at high complexity.
Formal logic, verification, model checking
Xinglang Zhang, Yunyao Zhang, ZeLiang Chen et al.
For researchers and practitioners using LLMs for logical reasoning, this work reveals a fundamental limitation and offers a method to improve robustness at high complexity.
Shubham Agarwal, Alexander Krentsel, Shu Liu et al.
For developers of safety-critical distributed systems, IDS dramatically reduces the effort and cost of formal verification, which previously required months to years of expert work.
Chengwu Liu, Yichun Yin, Ye Yuan et al.
For researchers in automated theorem proving, this work provides a more realistic benchmark and a framework that exposes a large gap between answer discovery and formal proof, enabling better evaluation of AI reasoning.
Jan Grebík, Pavel Hubáček, Martin Koutecký et al.
For mathematicians and computer scientists, this work shows that LLMs can autonomously contribute publishable results, advancing the frontier of AI-assisted research.
Ruida Wang, Jerry Huang, Pengcheng Wang et al.
For developers of LLM-based agent systems, this work provides a formal method to specify, verify, and debug multi-step workflows, addressing a critical lack of reliability in current agent systems.
Zhe Ye, Aidan Z. H. Yang, Huangyuan Su et al.
For developers and verification engineers, this reduces the expertise and cost of writing formal specifications, but the approach is incremental, combining known techniques (LLMs, traceability, repair) in a new pipeline.
Alexander K Taylor, Junyi Zhang, Ethan Ji et al.
This work addresses a gap in ATP robustness for research mathematics, where exploratory and prototype-heavy definitions are common, though it is incremental in highlighting a specific bottleneck.
Kyuhee Kim, Auguste Poiroux, Antoine Bosselut
For researchers using LLMs for formal verification, this work highlights that high compilation rates do not guarantee faithful reasoning, revealing a critical gap in current evaluation practices.
Derek Egolf, Yuhao Zhou, Stavros Tripakis
This addresses the problem of evaluating LLMs' capabilities in program synthesis for AI and software engineering, showing they are currently incremental compared to specialized tools.
Romy Peled, Daniel Kroening, Michael Tautschnig et al.
For formal verification engineers, this approach automates part of the induction proof process, though it is incremental and requires reprompting.
Slim Barkallah, Luke Bailey, Kaiyue Wen et al.
For AI systems and researchers working on automated proof verification in mathematics, this work provides a practical format and verification method that improves over existing LLM-based judges.
Xinze Li, Nanyun Peng, Simone Severini et al.
For developers of formal mathematics libraries, this work quantifies structural inefficiencies and mismatches between human-designed taxonomies and logical dependencies.
Nowfel Mashnoor, Hadi Kamali, Kimia Azar
For hardware verification engineers, this work automates assertion generation with formal correctness guarantees, reducing manual effort and expertise requirements.
Viresh Pati, Zhengyu Li, Piyush Jha et al.
For researchers evaluating LLM reasoning and tool use, MathConstraint provides a non-saturating, verifiable benchmark with tunable difficulty, revealing sensitivity to tool-call budgets that fixed benchmarks miss.
Jialin Lu, Soonho Kong, Rodrigo Stehling et al.
It addresses the practical need for multi-objective proof optimization in the Lean theorem prover community, where LLM-generated proofs are verbose and brittle across versions.
Banri Yanahama, Akiyoshi Sannai
This addresses the challenge of ensuring semantic correctness in large-scale AI-assisted formal mathematics, though it is incremental by building on existing proof assistant tools.
Kári Rögnvaldsson, Chenhao Sun, Jasper Dekoninck et al.
For researchers using LLMs for formal theorem proving, this work provides a cost-aware method to reduce compute waste without sacrificing proof success rates.
Isaac David, Marco Guarnieri, Arthur Gervais
This work provides a foundational formal framework for specifying and enforcing behavioral boundaries in agentic security systems, addressing a critical safety problem for developers and operators of such systems.
Krzysztof Olejniczak, Radoslav Dimitrov, Xingyue Huang et al.
For the field of AI-driven formal theorem proving, this work identifies a key missing inductive bias (symmetry) and provides a practical method to mitigate it, though the approach is incremental.
Pedro Orvalho, Marta Kwiatkowska, Guillem Alenyà et al.
For users needing reliable optimisation from natural language descriptions, this method significantly improves correctness over direct-answer, chain-of-thought, and program-of-thought baselines.