cs.LOComputer Science

Logic in CS

Formal logic, verification, model checking

17.3AIMay 22

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

Shubham Agarwal, Alexander Krentsel, Shu Liu et al.

For developers of safety-critical distributed systems, IDS dramatically reduces the effort and cost of formal verification, which previously required months to years of expert work.

19.7AIApr 15Code30

Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning

Xinglang Zhang, Yunyao Zhang, ZeLiang Chen et al.

For researchers and practitioners using LLMs for logical reasoning, this work reveals a fundamental limitation and offers a method to improve robustness at high complexity.

14.2AIJun 2

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

Ruida Wang, Jerry Huang, Pengcheng Wang et al.

For developers of LLM-based agent systems, this work provides a formal method to specify, verify, and debug multi-step workflows, addressing a critical lack of reliability in current agent systems.

12.2LGMar 13

TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?

Alexander K Taylor, Junyi Zhang, Ethan Ji et al.

This work addresses a gap in ATP robustness for research mathematics, where exploratory and prototype-heavy definitions are common, though it is incremental in highlighting a specific bottleneck.

6.2CLJun 3

Optimizing the Cost-Quality Tradeoff of Agentic Theorem Provers in Lean

Kári Rögnvaldsson, Chenhao Sun, Jasper Dekoninck et al.

For researchers using LLMs for formal theorem proving, this work provides a cost-aware method to reduce compute waste without sacrificing proof success rates.

14.9PLMar 13Code

Can LLMs Perform Synthesis?

Derek Egolf, Yuhao Zhou, Stavros Tripakis

This addresses the problem of evaluating LLMs' capabilities in program synthesis for AI and software engineering, showing they are currently incremental compared to specialized tools.

15.3LOMay 1Code

Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?

Romy Peled, Daniel Kroening, Michael Tautschnig et al.

For formal verification engineers, this approach automates part of the induction proof process, though it is incremental and requires reprompting.

13.3HCMar 16Code18

Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization

Banri Yanahama, Akiyoshi Sannai

This addresses the challenge of ensuring semantic correctness in large-scale AI-assisted formal mathematics, though it is incremental by building on existing proof assistant tools.

14.5AIApr 17

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

Chengwu Liu, Yichun Yin, Ye Yuan et al.

For researchers in automated theorem proving, this work provides a more realistic benchmark and a framework that exposes a large gap between answer discovery and formal proof, enabling better evaluation of AI reasoning.

24.9LGJun 25

Theory-Scale Auto-Formalization of Logics for Computer Science

Yuming Feng, Frederick Pu, One An et al.

This benchmark addresses the need for scalable, coherent auto-formalization of interdependent mathematical theories, a critical bottleneck for formal verification.

13.0CRApr 25

From Language to Logic: Bridging LLMs & Formal Representations for RTL Assertion Generation

Nowfel Mashnoor, Hadi Kamali, Kimia Azar

For hardware verification engineers, this work automates assertion generation with formal correctness guarantees, reducing manual effort and expertise requirements.

10.3LGMay 21

What are the Right Symmetries for Formal Theorem Proving?

Krzysztof Olejniczak, Radoslav Dimitrov, Xingyue Huang et al.

For the field of AI-driven formal theorem proving, this work identifies a key missing inductive bias (symmetry) and provides a practical method to mitigate it, though the approach is incremental.

10.5AIMay 28Code

Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Pedro Orvalho, Marta Kwiatkowska, Guillem Alenyà et al.

For users needing reliable optimisation from natural language descriptions, this method significantly improves correctness over direct-answer, chain-of-thought, and program-of-thought baselines.

8.6CLApr 18Code

Bolzano: Case Studies in LLM-Assisted Mathematical Research

Jan Grebík, Pavel Hubáček, Martin Koutecký et al.

For mathematicians and computer scientists, this work shows that LLMs can autonomously contribute publishable results, advancing the frontier of AI-assisted research.

13.7LOMar 14

Power Term Polynomial Algebra for Boolean Logic

Emanuele Sansone, Armando Solar-Lezama

This provides a new intermediate representation for bridging clause-based and algebraic reasoning in Boolean logic, though it appears incremental as it builds on existing CNF and ANF frameworks.

9.8LGMar 20

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

Guillaume Baudart, Marc Lelarge, Tristan Stérin et al.

This work addresses the challenge of automated theorem proving in competitive mathematics, representing an incremental advance by applying existing methods to new data.

15.3LOApr 26Code5

The Network Structure of Mathlib

Xinze Li, Nanyun Peng, Simone Severini et al.

For developers of formal mathematics libraries, this work quantifies structural inefficiencies and mismatches between human-designed taxonomies and logical dependencies.

15.8LOMay 19

Pseudo-Formalization for Automatic Proof Verification

Slim Barkallah, Luke Bailey, Kaiyue Wen et al.

For AI systems and researchers working on automated proof verification in mathematics, this work provides a practical format and verification method that improves over existing LLM-based judges.

15.3LOMay 18

Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Jialin Lu, Soonho Kong, Rodrigo Stehling et al.

It addresses the practical need for multi-objective proof optimization in the Lean theorem prover community, where LLM-generated proofs are verbose and brittle across versions.

11.5PLMar 15Code2

s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

Balaji Rao, John Harrison, Soonho Kong et al.

This addresses the problem of assessing LLM-based theorem proving for practical, low-level code in cryptography, offering a novel benchmark for researchers in automated reasoning and AI.