28.6PLMay 21
Formally Verified Linear-Time Invertible LexingSamuel Chassot, Viktor Kunčak
We present ZipLex, a verified framework for invertible linear-time lexical analysis following the longest match (maximal munch) semantics. Unlike past verified lexers that focus only on satisfying the semantics of regular expressions and the longest match property, ZipLex also guarantees that lexing and printing are mutual inverses. Thanks to verified memoization, it also ensures that the lexical analysis of a string is linear in the size of the string. Our design and implementation rely on two sets of ideas: (1) a new abstraction of token sequences that captures the separability of tokens in a sequence while supporting their efficient manipulation, and (2) a combination of verified data structures and optimizations, including Huet's zippers and memoization with a standalone verified imperative hash table. Our hash table offers competitive performance as shown by our evaluation. We implemented and verified ZipLex using the Stainless deductive verifier for Scala. Our evaluation demonstrates that ZipLex supports realistic applications such as JSON processing and lexers of programming languages, and behaves linearly even in cases that make flex-style approaches quadratic. ZipLex is two orders of magnitude faster than Verbatim++, showing that verified invertibility and linear-time algorithms can be developed without prohibitive cost. Compared to Coqlex, ZipLex also offers linear (instead of quadratic) time lexing, and is the first lexer that comes with invertibility proofs for printing token sequences.
79.5LOMay 14
Orthologic for SAT SolvingVladislas de Haldat, Simon Guilloud, Viktor Kunčak
We present a new algorithm for deciding formula entailment in orthologic (a sound approximation of classical logic) that avoids the costly preprocessing phase of prior implementations while retaining the same $\mathcal{O}(n^2(1+|A|))$ worst-case complexity. We then introduce a family of synthetic SAT benchmarks based on the observation that, for any formula $ϕ$, the equivalence $ϕ\leftrightarrow \mathrm{NF}_{\mathrm{OL}}(ϕ)$ is a tautology whose Tseitin encoding yields unsatisfiable instances that are hard for state-of-the-art SAT solvers yet have short orthologic proofs. Applied to EPFL arithmetic circuits, our algorithm solves these instances efficiently while Kissat times out on a significant fraction. Finally, we show that using orthologic normalization as a preprocessing step can improve SAT solving time on some hard problems.
23.5PLMay 8
First-Class Refinement Types for ScalaMatt Bovel, Viktor Kunčak, Martin Odersky
Refinement types -- types qualified with logical predicates -- have proven effective for lightweight verification in languages like Liquid Haskell, F*, and Dafny. However, in these systems refinements are either written in a separate specification language or treated as second-class annotations, disconnected from the host language's type system. This disconnect creates usability barriers: programmers must maintain two mental models, and refinements cannot interact with features like type inference, subtyping, or overloading. We present the design of first-class refinement types for Scala 3, where refinements are ordinary types that participate in subtyping, inference, and pattern matching alongside existing language features. We prove type soundness of a core calculus mechanized in Rocq, combining dependent function types, bounded polymorphism, positive equi-recursive types, union and intersection types, and refinement types under a partial-correctness semantics using a fuel-bounded definitional interpreter and semantic typing. Finally, we implement our design as a prototype extension of the Scala 3 compiler with a lightweight e-graph-based solver for predicate entailment.
CLOct 29, 2025
RLMEval: Evaluating Research-Level Neural Theorem ProvingAuguste Poiroux, Antoine Bosselut, Viktor Kunčak
Despite impressive results on curated benchmarks, the practical impact of large language models (LLMs) on research-level neural theorem proving and proof autoformalization is still limited. We introduce RLMEval, an evaluation suite for these tasks, focusing on research-level mathematics from real-world Lean formalization projects. RLMEval targets the evaluation of neural theorem proving and proof autoformalization on challenging research-level theorems by leveraging real Lean Blueprint formalization projects. Our evaluation of state-of-the-art models on RLMEval, comprising 613 theorems from 6 Lean projects, reveals a significant gap: progress on existing benchmarks does not readily translate to these more realistic settings, with the best model achieving only a 10.3 % pass rate. RLMEval provides a new, challenging benchmark designed to guide and accelerate progress in automated reasoning for formal mathematics.
CLJun 11, 2024
Reliable Evaluation and Benchmarks for Statement AutoformalizationAuguste Poiroux, Gail Weiss, Viktor Kunčak et al.
Evaluating statement autoformalization, translating natural language mathematics into formal languages like Lean 4, remains a significant challenge, with few metrics, datasets, and standards to robustly measure progress. In this work, we present a comprehensive approach combining improved metrics, robust benchmarks, and systematic evaluation, to fill this gap. First, we introduce BEq+, an automated metric that correlates strongly with human judgment, along with ProofNetVerif, a new dataset for assessing the quality of evaluation metrics, containing 3,752 annotated examples. Second, we develop two new autoformalization benchmarks: ProofNet#, a corrected version of ProofNet, and RLM25, with 619 new pairs of research-level mathematics from six formalization projects. Through systematic experimentation across these benchmarks, we find that current techniques can achieve up to 45.1% accuracy on undergraduate mathematics but struggle with research-level content without proper context. Our work establishes a reliable foundation for evaluating and advancing autoformalization systems.
SEJul 16, 2021
Verifying a Realistic Mutable Hash TableSamuel Chassot, Viktor Kunčak
In this work, we verify the mutable LongMap from the Scala standard library, a hash table using open addressing within a single array, using the Stainless program verifier. As a reference implementation, we write an immutable map based on a list of tuples. We then show that LongMap's operations correspond to operations of this association list. To express the resizing of the hash table array, we introduce a new reference swapping construct in Stainless. This allows us to apply the decorator pattern without introducing aliasing. Our verification effort led us to find and fix a bug in the original implementation that manifests for large hash tables. Our performance analysis shows the verified version to be within a 1.5 factor of the original data structure.
AIFeb 6, 2019
Neural-Network Guided Expression TransformationRomain Edelmann, Viktor Kunčak
Optimizing compilers, as well as other translator systems, often work by rewriting expressions according to equivalence preserving rules. Given an input expression and its optimized form, finding the sequence of rules that were applied is a non-trivial task. Most of the time, the tools provide no proof, of any kind, of the equivalence between the original expression and its optimized form. In this work, we propose to reconstruct proofs of equivalence of simple mathematical expressions, after the fact, by finding paths of equivalence preserving transformations between expressions. We propose to find those sequences of transformations using a search algorithm, guided by a neural network heuristic. Using a Tree-LSTM recursive neural network, we learn a distributed representation of expressions where the Manhattan distance between vectors approximately corresponds to the rewrite distance between expressions. We then show how the neural network can be efficiently used to search for transformation paths, leading to substantial gain in speed compared to an uninformed exhaustive search. In one of our experiments, our neural-network guided search algorithm is able to solve more instances with a 2 seconds timeout per instance than breadth-first search does with a 5 minutes timeout per instance.