Josef Urban

AI
h-index31
54papers
1,953citations
Novelty47%
AI Score43

54 Papers

AIMay 4, 2022
The Isabelle ENIGMA

Zarathustra A. Goertzel, Jan Jakubův, Cezary Kaliszyk et al.

We significantly improve the performance of the E automated theorem prover on the Isabelle Sledgehammer problems by combining learning and theorem proving in several ways. In particular, we develop targeted versions of the ENIGMA guidance for the Isabelle problems, targeted versions of neural premise selection, and targeted strategies for E. The methods are trained in several iterations over hundreds of thousands untyped and typed first-order problems extracted from Isabelle. Our final best single-strategy ENIGMA and premise selection system improves the best previous version of E by 25.3% in 15 seconds, outperforming also all other previous ATP and SMT systems.

AIMar 12, 2023
MizAR 60 for Mizar 50

Jan Jakubův, Karel Chvalovský, Zarathustra Goertzel et al.

As a present to Mizar on its 50th anniversary, we develop an AI/TP system that automatically proves about 60\% of the Mizar theorems in the hammer setting. We also automatically prove 75\% of the Mizar theorems when the automated provers are helped by using only the premises used in the human-written Mizar proofs. We describe the methods and large-scale experiments leading to these results. This includes in particular the E and Vampire provers, their ENIGMA and Deepire learning modifications, a number of learning-based premise selection methods, and the incremental loop that interleaves growing a corpus of millions of ATP proofs with training increasingly strong AI/TP systems on them. We also present a selection of Mizar problems that were proved automatically.

LGOct 7, 2022
Machine Learning Meets The Herbrand Universe

Jelle Piepenbrock, Josef Urban, Konstantin Korovin et al.

The appearance of strong CDCL-based propositional (SAT) solvers has greatly advanced several areas of automated reasoning (AR). One of the directions in AR is thus to apply SAT solvers to expressive formalisms such as first-order logic, for which large corpora of general mathematical problems exist today. This is possible due to Herbrand's theorem, which allows reduction of first-order problems to propositional problems by instantiation. The core challenge is choosing the right instances from the typically infinite Herbrand universe. In this work, we develop the first machine learning system targeting this task, addressing its combinatorial and invariance properties. In particular, we develop a GNN2RNN architecture based on an invariant graph neural network (GNN) that learns from problems and their solutions independently of symbol names (addressing the abundance of skolems), combined with a recurrent neural network (RNN) that proposes for each clause its instantiations. The architecture is then trained on a corpus of mathematical problems and their instantiation-based proofs, and its performance is evaluated in several ways. We show that the trained system achieves high accuracy in predicting the right instances, and that it is capable of solving many problems by educated guessing when combined with a ground solver. To our knowledge, this is the first convincing use of machine learning in synthesizing relevant elements from arbitrary Herbrand universes.

25.6AIApr 8
Munkres' General Topology Autoformalized in Isabelle/HOL

Dustin Bryant, Jonathan Julián Huerta y Munive, Cezary Kaliszyk et al.

We describe an experiment in LLM-assisted autoformalization that produced over 85,000 lines of Isabelle/HOL code covering all 39 sections of Munkres' Topology (general topology, Chapters 2--8), from topological spaces through dimension theory. The LLM-based coding agents (initially ChatGPT 5.2 and then Claude Opus 4.6) used 24 active days for that. The formalization is complete: all 806 formal results are fully proved with zero sorry's. Proved results include the Tychonoff theorem, the Baire category theorem, the Nagata--Smirnov and Smirnov metrization theorems, the Stone--Čech compactification, Ascoli's theorem, the space-filling curve, and others. The methodology is based on a "sorry-first" declarative proof workflow combined with bulk use of sledgehammer - two of Isabelle major strengths. This leads to relatively fast autoformalization progress. We analyze the resulting formalization in detail, analyze the human--LLM interaction patterns from the session log, and briefly compare with related autoformalization efforts in Megalodon, HOL Light, and Naproche. The results indicate that LLM-assisted formalization of standard mathematical textbooks in Isabelle/HOL is quite feasible, cheap and fast, even if some human supervision is useful.

AIAug 26, 2024
Machine Learning for Quantifier Selection in cvc5

Jan Jakubův, Mikoláš Janota, Jelle Piepenbrock et al.

In this work we considerably improve the state-of-the-art SMT solving on first-order quantified problems by efficient machine learning guidance of quantifier selection. Quantifiers represent a significant challenge for SMT and are technically a source of undecidability. In our approach, we train an efficient machine learning model that informs the solver which quantifiers should be instantiated and which not. Each quantifier may be instantiated multiple times and the set of the active quantifiers changes as the solving progresses. Therefore, we invoke the ML predictor many times, during the whole run of the solver. To make this efficient, we use fast ML models based on gradient boosting decision trees. We integrate our approach into the state-of-the-art cvc5 SMT solver and show a considerable increase of the system's holdout-set performance after training it on a large set of first-order problems collected from the Mizar Mathematical Library.

AIJan 27, 2023
Alien Coding

Thibault Gauthier, Miroslav Olšák, Josef Urban

We introduce a self-learning algorithm for synthesizing programs for OEIS sequences. The algorithm starts from scratch initially generating programs at random. Then it runs many iterations of a self-learning loop that interleaves (i) training neural machine translation to learn the correspondence between sequences and the programs discovered so far, and (ii) proposing many new programs for each OEIS sequence by the trained neural machine translator. The algorithm discovers on its own programs for more than 78000 OEIS sequences, sometimes developing unusual programming methods. We analyze its behavior and the invented programs in several experiments.

AIMar 6, 2024
Learning Guided Automated Reasoning: A Brief Survey

Lasse Blaauwbroek, David Cerna, Thibault Gauthier et al.

Automated theorem provers and formal proof assistants are general reasoning systems that are in theory capable of proving arbitrarily hard theorems, thus solving arbitrary problems reducible to mathematics and logical reasoning. In practice, such systems however face large combinatorial explosion, and therefore include many heuristics and choice points that considerably influence their performance. This is an opportunity for trained machine learning predictors, which can guide the work of such reasoning systems. Conversely, deductive search supported by the notion of logically valid proof allows one to train machine learning systems on large reasoning corpora. Such bodies of proof are usually correct by construction and when combined with more and more precise trained guidance they can be boostrapped into very large corpora, with increasingly long reasoning chains and possibly novel proof ideas. In this paper we provide an overview of several automated reasoning and theorem proving domains and the learning and AI methods that have been so far developed for them. These include premise selection, proof guidance in several settings, AI systems and feedback loops iterating between reasoning and learning, and symbolic classification problems.

AIMar 3, 2025
Learning Conjecturing from Scratch

Thibault Gauthier, Josef Urban

We develop a self-learning approach for conjecturing of induction predicates on a dataset of 16197 problems derived from the OEIS. These problems are hard for today's SMT and ATP systems because they require a combination of inductive and arithmetical reasoning. Starting from scratch, our approach consists of a feedback loop that iterates between (i) training a neural translator to learn the correspondence between the problems solved so far and the induction predicates useful for them, (ii) using the trained neural system to generate many new induction predicates for the problems, (iii) fast runs of the z3 prover attempting to prove the problems using the generated predicates, (iv) using heuristics such as predicate size and solution speed on the proved problems to choose the best predicates for the next iteration of training. The algorithm discovers on its own many interesting induction predicates, ultimately solving 5565 problems, compared to 2265 problems solved by CVC5, Vampire or Z3 in 60 seconds.

AIJun 25, 2024
Solving Hard Mizar Problems with Instantiation and Strategy Invention

Jan Jakubův, Mikoláš Janota, Josef Urban

In this work, we prove over 3000 previously ATP-unproved Mizar/MPTP problems by using several ATP and AI methods, raising the number of ATP-solved Mizar problems from 75\% to above 80\%. First, we start to experiment with the cvc5 SMT solver which uses several instantiation-based heuristics that differ from the superposition-based systems, that were previously applied to Mizar,and add many new solutions. Then we use automated strategy invention to develop cvc5 strategies that largely improve cvc5's performance on the hard problems. In particular, the best invented strategy solves over 14\% more problems than the best previously available cvc5 strategy. We also show that different clausification methods have a high impact on such instantiation-based methods, again producing many new solutions. In total, the methods solve 3021 (21.3\%) of the 14163 previously unsolved hard Mizar problems. This is a new milestone over the Mizar large-theory benchmark and a large strengthening of the hammer methods for Mizar.

AIMay 13, 2023
Translating SUMO-K to Higher-Order Set Theory

Chad Brown, Adam Pease, Josef Urban

We describe a translation from a fragment of SUMO (SUMO-K) into higher-order set theory. The translation provides a formal semantics for portions of SUMO which are beyond first-order and which have previously only had an informal interpretation. It also for the first time embeds a large common-sense ontology into a very secure interactive theorem proving system. We further extend our previous work in finding contradictions in SUMO from first order constructs to include a portion of SUMO's higher order constructs. Finally, using the translation, we can create problems that can be proven using higher-order interactive and automated theorem provers. This is tested in several systems and can be used to form a corpus of higher-order common-sense reasoning problems.

AIFeb 24, 2022
Learning Program Synthesis for Integer Sequences from Scratch

Thibault Gauthier, Josef Urban

We present a self-learning approach for synthesizing programs from integer sequences. Our method relies on a tree search guided by a learned policy. Our system is tested on the On-Line Encyclopedia of Integer Sequences. There, it discovers, on its own, solutions for 27987 sequences starting from basic operators and without human-written training examples.

LOJul 21, 2021
Learning Theorem Proving Components

Karel Chvalovský, Jan Jakubův, Miroslav Olšák et al.

Saturation-style automated theorem provers (ATPs) based on the given clause procedure are today the strongest general reasoners for classical first-order logic. The clause selection heuristics in such systems are, however, often evaluating clauses in isolation, ignoring other clauses. This has changed recently by equipping the E/ENIGMA system with a graph neural network (GNN) that chooses the next given clause based on its evaluation in the context of previously selected clauses. In this work, we describe several algorithms and experiments with ENIGMA, advancing the idea of contextual evaluation based on learning important components of the graph of clauses.

AIJul 14, 2021
Fast and Slow Enigmas and Parental Guidance

Zarathustra Goertzel, Karel Chvalovský, Jan Jakubův et al.

We describe several additions to the ENIGMA system that guides clause selection in the E automated theorem prover. First, we significantly speed up its neural guidance by adding server-based GPU evaluation. The second addition is motivated by fast weight-based rejection filters that are currently used in systems like E and Prover9. Such systems can be made more intelligent by instead training fast versions of ENIGMA that implement more intelligent pre-filtering. This results in combinations of trainable fast and slow thinking that improves over both the fast-only and slow-only methods. The third addition is based on "judging the children by their parents", i.e., possibly rejecting an inference before it produces a clause. This is motivated by standard evolutionary mechanisms, where there is always a cost to producing all possible offsprings in the current population. This saves time by not evaluating all clauses by more expensive methods and provides a complementary view of the generated clauses. The methods are evaluated on a large benchmark coming from the Mizar Mathematical Library, showing good improvements over the state of the art.

AIMay 31, 2021
The Role of Entropy in Guiding a Connection Prover

Zsolt Zombori, Josef Urban, Miroslav Olšák

In this work we study how to learn good algorithms for selecting reasoning steps in theorem proving. We explore this in the connection tableau calculus implemented by leanCoP where the partial tableau provides a clean and compact notion of a state to which a limited number of inferences can be applied. We start by incorporating a state-of-the-art learning algorithm -- a graph neural network (GNN) -- into the plCoP theorem prover. Then we use it to observe the system's behaviour in a reinforcement learning setting, i.e., when learning inference guidance from successful Monte-Carlo tree searches on many problems. Despite its better pattern matching capability, the GNN initially performs worse than a simpler previously used learning algorithm. We observe that the simpler algorithm is less confident, i.e., its recommendations have higher entropy. This leads us to explore how the entropy of the inference selection implemented via the neural network influences the proof search. This is related to research in human decision-making under uncertainty, and in particular the probability matching theory. Our main result shows that a proper entropy regularisation, i.e., training the GNN not to be overconfident, greatly improves plCoP's performance on a large mathematical corpus.

LOApr 12, 2021
Online Machine Learning Techniques for Coq: A Comparison

Liao Zhang, Lasse Blaauwbroek, Bartosz Piotrowski et al.

We present a comparison of several online machine learning techniques for tactical learning and proving in the Coq proof assistant. This work builds on top of Tactician, a plugin for Coq that learns from proofs written by the user to synthesize new proofs. Learning happens in an online manner, meaning that Tactician's machine learning model is updated immediately every time the user performs a step in an interactive proof. This has important advantages compared to the more studied offline learning systems: (1) it provides the user with a seamless, interactive experience with Tactician and, (2) it takes advantage of locality of proof similarity, which means that proofs similar to the current proof are likely to be found close by. We implement two online methods, namely approximate k-nearest neighbors based on locality sensitive hashing forests and random decision forests. Additionally, we conduct experiments with gradient boosted trees in an offline setting using XGBoost. We compare the relative performance of Tactician using these three learning methods on Coq's standard library.

LGFeb 10, 2021
Learning Equational Theorem Proving

Jelle Piepenbrock, Tom Heskes, Mikoláš Janota et al.

We develop Stratified Shortest Solution Imitation Learning (3SIL) to learn equational theorem proving in a deep reinforcement learning (RL) setting. The self-trained models achieve state-of-the-art performance in proving problems generated by one of the top open conjectures in quasigroup theory, the Abelian Inner Mapping (AIM) Conjecture. To develop the methods, we first use two simpler arithmetic rewriting tasks that share tree-structured proof states and sparse rewards with the AIM problems. On these tasks, 3SIL is shown to significantly outperform several established RL and imitation learning methods. The final system is then evaluated in a standalone and cooperative mode on the AIM problems. The standalone 3SIL-trained system proves in 60 seconds more theorems (70.2%) than the complex, hand-engineered Waldmeister system (65.5%). In the cooperative mode, the final system is combined with the Prover9 system, proving in 2 seconds what standalone Prover9 proves in 60 seconds.

AIJul 31, 2020
The Tactician (extended version): A Seamless, Interactive Tactic Learner and Prover for Coq

Lasse Blaauwbroek, Josef Urban, Herman Geuvers

We present Tactician, a tactic learner and prover for the Coq Proof Assistant. Tactician helps users make tactical proof decisions while they retain control over the general proof strategy. To this end, Tactician learns from previously written tactic scripts and gives users either suggestions about the next tactic to be executed or altogether takes over the burden of proof synthesis. Tactician's goal is to provide users with a seamless, interactive, and intuitive experience together with robust and adaptive proof automation. In this paper, we give an overview of Tactician from the user's point of view, regarding both day-to-day usage and issues of package dependency management while learning in the large. Finally, we give a peek into Tactician's implementation as a Coq plugin and machine learning platform.

AIMay 29, 2020
First Neural Conjecturing Datasets and Experiments

Josef Urban, Jan Jakubův

We describe several datasets and first experiments with creating conjectures by neural methods. The datasets are based on the Mizar Mathematical Library processed in several forms and the problems extracted from it by the MPTP system and proved by the E prover using the ENIGMA guidance. The conjecturing experiments use the Transformer architecture and in particular its GPT-2 implementation.

LOApr 15, 2020
Prolog Technology Reinforcement Learning Prover

Zsolt Zombori, Josef Urban, Chad E. Brown

We present a reinforcement learning toolkit for experiments with guiding automated theorem proving in the connection calculus. The core of the toolkit is a compact and easy to extend Prolog-based automated theorem prover called plCoP. plCoP builds on the leanCoP Prolog implementation and adds learning-guided Monte-Carlo Tree Search as done in the rlCoP system. Other components include a Python interface to plCoP and machine learners, and an external proof checker that verifies the validity of plCoP proofs. The toolkit is evaluated on two benchmarks and we demonstrate its extendability by two additions: (1) guidance is extended to reduction steps and (2) the standard leanCoP calculus is extended with rewrite steps and their learned guidance. We argue that the Prolog setting is suitable for combining statistical and symbolic learning methods. The complete toolkit is publicly released.

AIMar 20, 2020
Tactic Learning and Proving for the Coq Proof Assistant

Lasse Blaauwbroek, Josef Urban, Herman Geuvers

We present a system that utilizes machine learning for tactic proof search in the Coq Proof Assistant. In a similar vein as the TacticToe project for HOL4, our system predicts appropriate tactics and finds proofs in the form of tactic scripts. To do this, it learns from previous tactic scripts and how they are applied to proof states. The performance of the system is evaluated on the Coq Standard Library. Currently, our predictor can identify the correct tactic to be applied to a proof state 23.4% of the time. Our proof searcher can fully automatically prove 39.3% of the lemmas. When combined with the CoqHammer system, the two systems together prove 56.7% of the library's lemmas.

LOMar 11, 2020
Stateful Premise Selection by Recurrent Neural Networks

Bartosz Piotrowski, Josef Urban

In this work, we develop a new learning-based method for selecting facts (premises) when proving new goals over large formal libraries. Unlike previous methods that choose sets of facts independently of each other by their rank, the new method uses the notion of \emph{state} that is updated each time a choice of a fact is made. Our stateful architecture is based on recurrent neural networks which have been recently very successful in stateful tasks such as language translation. The new method is combined with data augmentation techniques, evaluated in several ways on a standard large-theory benchmark, and compared to state-of-the-art premise approach based on gradient boosted trees. It is shown to perform significantly better and to solve many new problems.

AIFeb 13, 2020
ENIGMA Anonymous: Symbol-Independent Inference Guiding Machine (system description)

Jan Jakubův, Karel Chvalovský, Miroslav Olšák et al.

We describe an implementation of gradient boosting and neural guidance of saturation-style automated theorem provers that does not depend on consistent symbol names across problems. For the gradient-boosting guidance, we manually create abstracted features by considering arity-based encodings of formulas. For the neural guidance, we use symbol-independent graph neural networks (GNNs) and their embedding of the terms and clauses. The two methods are efficiently implemented in the E prover and its ENIGMA learning-guided framework. To provide competitive real-time performance of the GNNs, we have developed a new context-based approach to evaluation of generated clauses in E. Clauses are evaluated jointly in larger batches and with respect to a large number of already selected clauses (context) by the GNN that estimates their collectively most useful subset in several rounds of message passing. This means that approximative inference rounds done by the GNN are efficiently interleaved with precise symbolic inference rounds done inside E. The methods are evaluated on the MPTP large-theory benchmark and shown to achieve comparable real-time performance to state-of-the-art symbol-based methods. The methods also show high complementarity, solving a large number of hard Mizar problems.

LODec 5, 2019
Exploration of Neural Machine Translation in Autoformalization of Mathematics in Mizar

Qingxiang Wang, Chad Brown, Cezary Kaliszyk et al.

In this paper we share several experiments trying to automatically translate informal mathematics into formal mathematics. In our context informal mathematics refers to human-written mathematical sentences in the LaTeX format; and formal mathematics refers to statements in the Mizar language. We conducted our experiments against three established neural network-based machine translation models that are known to deliver competitive results on translating between natural languages. To train these models we also prepared four informal-to-formal datasets. We compare and analyze our results according to whether the model is supervised or unsupervised. In order to augment the data available for auto-formalization and improve the results, we develop a custom type-elaboration mechanism and integrate it in the supervised translation.

AINov 27, 2019
Property Invariant Embedding for Automated Reasoning

Miroslav Olšák, Cezary Kaliszyk, Josef Urban

Automated reasoning and theorem proving have recently become major challenges for machine learning. In other domains, representations that are able to abstract over unimportant transformations, such as abstraction over translations and rotations in vision, are becoming more common. Standard methods of embedding mathematical formulas for learning theorem proving are however yet unable to handle many important transformations. In particular, embedding previously unseen labels, that often arise in definitional encodings and in Skolemization, has been very weak so far. Similar problems appear when transferring knowledge between known symbols. We propose a novel encoding of formulas that extends existing graph neural network models. This encoding represents symbols only by nodes in the graph, without giving the network any knowledge of the original labels. We provide additional links between such nodes that allow the network to recover the meaning and therefore correctly embed such nodes irrespective of the given labels. We test the proposed encoding in an automated theorem prover based on the tableaux connection calculus, and show that it improves on the best characterizations used so far. The encoding is further evaluated on the premise selection task and a newly introduced symbol guessing task, and shown to correctly predict 65% of the symbol names.

AINov 7, 2019
Can Neural Networks Learn Symbolic Rewriting?

Bartosz Piotrowski, Josef Urban, Chad E. Brown et al.

This work investigates if the current neural architectures are adequate for learning symbolic rewriting. Two kinds of data sets are proposed for this research -- one based on automated proofs and the other being a synthetic set of polynomial terms. The experiments with use of the current neural machine translation models are performed and its results are discussed. Ideas for extending this line of research are proposed, and its relevance is motivated.

LOMay 30, 2019
Towards Finding Longer Proofs

Zsolt Zombori, Adrián Csiszárik, Henryk Michalewski et al.

We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). Unlike most learning based approaches, we focus on generalising from very little training data and achieving near complete confidence. We use several simple, structured datasets with very long proofs to show that FLoP can successfully generalise a single training proof to a large class of related problems. On these benchmarks, FLoP is competitive with strong theorem provers despite using very limited search, due to its ability to solve problems that are prohibitively long for other systems.

AIMay 23, 2019
ENIGMAWatch: ProofWatch Meets ENIGMA

Zarathustra Goertzel, Jan Jakubův, Josef Urban

In this work we describe a new learning-based proof guidance -- ENIGMAWatch -- for saturation-style first-order theorem provers. ENIGMAWatch combines two guiding approaches for the given-clause selection implemented for the E ATP system: ProofWatch and ENIGMA. ProofWatch is motivated by the watchlist (hints) method and based on symbolic matching of multiple related proofs, while ENIGMA is based on statistical machine learning. The two methods are combined by using the evolving information about symbolic proof matching as an additional information that characterizes the saturation-style proof search for the statistical learning methods. The new system is experimentally evaluated on a large set of problems from the Mizar Library. We show that the added proof-matching information is considered important by the statistical machine learners, and that it leads to improvements in E's Performance over ProofWatch and ENIGMA.

AIMay 20, 2019
Guiding Inferences in Connection Tableau by Recurrent Neural Networks

Bartosz Piotrowski, Josef Urban

We present a dataset and experiments on applying recurrent neural networks (RNNs) for guiding clause selection in the connection tableau proof calculus. The RNN encodes a sequence of literals from the current branch of the partial proof tree to a hidden vector state; using it, the system selects a clause for extending the proof tree. The training data and learning setup are described, and the results are discussed and compared with state of the art using gradient boosted trees. Additionally, we perform a conjecturing experiment in which the RNN does not just select an existing clause, but completely constructs the next tableau goal.

AIApr 2, 2019
Hammering Mizar by Learning Clause Guidance

Jan Jakubův, Josef Urban

We describe a very large improvement of existing hammer-style proof automation over large ITP libraries by combining learning and theorem proving. In particular, we have integrated state-of-the-art machine learners into the E automated theorem prover, and developed methods that allow learning and efficient internal guidance of E over the whole Mizar library. The resulting trained system improves the real-time performance of E on the Mizar library by 70% in a single-strategy setting.

AIMar 7, 2019
ENIGMA-NG: Efficient Neural and Gradient-Boosted Inference Guidance for E

Karel Chvalovský, Jan Jakubův, Martin Suda et al.

We describe an efficient implementation of clause guidance in saturation-based automated theorem provers extending the ENIGMA approach. Unlike in the first ENIGMA implementation where fast linear classifier is trained and used together with manually engineered features, we have started to experiment with more sophisticated state-of-the-art machine learning methods such as gradient boosted trees and recursive neural networks. In particular the latter approach poses challenges in terms of efficiency of clause evaluation, however, we show that deep integration of the neural evaluation with the ATP data-structures can largely amortize this cost and lead to competitive real-time results. Both methods are evaluated on a large dataset of theorem proving problems and compared with the previous approaches. The resulting methods improve on the manually designed clause guidance, providing the first practically convincing application of gradient-boosted and neural clause guidance in saturation-style automated theorem provers.

AIMay 19, 2018
Reinforcement Learning of Theorem Proving

Cezary Kaliszyk, Josef Urban, Henryk Michalewski et al.

We introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search. Instead, it runs many Monte-Carlo simulations guided by reinforcement learning from previous proof attempts. We produce several versions of the prover, parameterized by different learning and guiding algorithms. The strongest version of the system is trained on a large corpus of mathematical problems and evaluated on previously unseen problems. The trained system solves within the same number of inferences over 40% more problems than a baseline prover, which is an unusually high improvement in this hard AI domain. To our knowledge this is the first time reinforcement learning has been convincingly applied to solving general mathematical problems on a large scale.

CLMay 10, 2018
First Experiments with Neural Translation of Informal to Formal Mathematics

Qingxiang Wang, Cezary Kaliszyk, Josef Urban

We report on our experiments to train deep neural networks that automatically translate informalized LaTeX-written Mizar texts into the formal Mizar language. To the best of our knowledge, this is the first time when neural networks have been adopted in the formalization of mathematics. Using Luong et al.'s neural machine translation model (NMT), we tested our aligned informal-formal corpora against various hyperparameters and evaluated their results. Our experiments show that our best performing model configurations are able to generate correct Mizar statements on 65.73\% of the inference data, with the union of all models covering 79.17\%. These results indicate that formalization through artificial neural network is a promising approach for automated formalization of mathematics. We present several case studies to illustrate our results.

AIApr 2, 2018
TacticToe: Learning to Prove with Tactics

Thibault Gauthier, Cezary Kaliszyk, Josef Urban et al.

We implement a automated tactical prover TacticToe on top of the HOL4 interactive theorem prover. TacticToe learns from human proofs which mathematical technique is suitable in each proof situation. This knowledge is then used in a Monte Carlo tree search algorithm to explore promising tactic-level proof paths. On a single CPU, with a time limit of 60 seconds, TacticToe proves 66.4 percent of the 7164 theorems in HOL4's standard library, whereas E prover with auto-schedule solves 34.5 percent. The success rate rises to 69.0 percent by combining the results of TacticToe and E prover.

AIApr 2, 2018
Learning to Reason with HOL4 tactics

Thibault Gauthier, Cezary Kaliszyk, Josef Urban

Techniques combining machine learning with translation to automated reasoning have recently become an important component of formal proof assistants. Such "hammer" tech- niques complement traditional proof assistant automation as implemented by tactics and decision procedures. In this paper we present a unified proof assistant automation approach which attempts to automate the selection of appropriate tactics and tactic-sequences com- bined with an optimized small-scale hammering approach. We implement the technique as a tactic-level automation for HOL4: TacticToe. It implements a modified A*-algorithm directly in HOL4 that explores different tactic-level proof paths, guiding their selection by learning from a large number of previous tactic-level proofs. Unlike the existing hammer methods, TacticToe avoids translation to FOL, working directly on the HOL level. By combining tactic prediction and premise selection, TacticToe is able to re-prove 39 percent of 7902 HOL4 theorems in 5 seconds whereas the best single HOL(y)Hammer strategy solves 32 percent in the same amount of time.

AIFeb 12, 2018
ProofWatch: Watchlist Guidance for Large Theories in E

Zarathustra Goertzel, Jan Jakubův, Stephan Schulz et al.

Watchlist (also hint list) is a mechanism that allows related proofs to guide a proof search for a new conjecture. This mechanism has been used with the Otter and Prover9 theorem provers, both for interactive formalizations and for human-assisted proving of open conjectures in small theories. In this work we explore the use of watchlists in large theories coming from first-order translations of large ITP libraries, aiming at improving hammer-style automation by smarter internal guidance of the ATP systems. In particular, we (i) design watchlist-based clause evaluation heuristics inside the E ATP system, and (ii) develop new proof guiding algorithms that load many previous proofs inside the ATP and focus the proof search using a dynamically updated notion of proof matching. The methods are evaluated on a large set of problems coming from the Mizar library, showing significant improvement of E's standard portfolio of strategies, and also of the previous best set of strategies invented for Mizar by evolutionary methods.

AIFeb 9, 2018
ATPboost: Learning Premise Selection in Binary Setting with ATP Feedback

Bartosz Piotrowski, Josef Urban

ATPboost is a system for solving sets of large-theory problems by interleaving ATP runs with state-of-the-art machine learning of premise selection from the proofs. Unlike many previous approaches that use multi-label setting, the learning is implemented as binary classification that estimates the pairwise-relevance of (theorem, premise) pairs. ATPboost uses for this the XGBoost gradient boosting algorithm, which is fast and has state-of-the-art performance on many tasks. Learning in the binary setting however requires negative examples, which is nontrivial due to many alternative proofs. We discuss and implement several solutions in the context of the ATP/ML feedback loop, and show that ATPboost with such methods significantly outperforms the k-nearest neighbors multilabel classifier.

LOJan 23, 2017
ENIGMA: Efficient Learning-based Inference Guiding Machine

Jan Jakubův, Josef Urban

ENIGMA is a learning-based method for guiding given clause selection in saturation-based theorem provers. Clauses from many proof searches are classified as positive and negative based on their participation in the proofs. An efficient classification model is trained on this data, using fast feature-based characterization of the clauses . The learned model is then tightly linked with the core prover and used as a basis of a new parameterized evaluation heuristic that provides fast ranking of all generated clauses. The approach is evaluated on the E prover and the CASC 2016 AIM benchmark, showing a large increase of E's performance.

CLNov 29, 2016
Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving

Cezary Kaliszyk, Josef Urban, Jiří Vyskočil

We study methods for automated parsing of informal mathematical expressions into formal ones, a main prerequisite for deep computer understanding of informal mathematical texts. We propose a context-based parsing approach that combines efficient statistical learning of deep parse trees with their semantic pruning by type checking and large-theory automated theorem proving. We show that the methods very significantly improve on previous results in parsing theorems from the Flyspeck corpus.

LONov 26, 2016
BliStrTune: Hierarchical Invention of Theorem Proving Strategies

Jan Jakubuv, Josef Urban

Inventing targeted proof search strategies for specific problem sets is a difficult task. State-of-the-art automated theorem provers (ATPs) such as E allow a large number of user-specified proof search strategies described in a rich domain specific language. Several machine learning methods that invent strategies automatically for ATPs were proposed previously. One of them is the Blind Strategymaker (BliStr), a system for automated invention of ATP strategies. In this paper we introduce BliStrTune -- a hierarchical extension of BliStr. BliStrTune allows exploring much larger space of E strategies by interleaving search for high-level parameters with their fine-tuning. We use BliStrTune to invent new strategies based also on new clause weight functions targeted at problems from large ITP libraries. We show that the new strategies significantly improve E's performance in solving problems from the Mizar Mathematical Library.

LONov 18, 2016
Monte Carlo Tableau Proof Search

Michael Färber, Cezary Kaliszyk, Josef Urban

We study Monte Carlo Tree Search to guide proof search in tableau calculi. This includes proposing a number of proof-state evaluation heuristics, some of which are learnt from previous proofs. We present an implementation based on the leanCoP prover. The system is trained and evaluated on a large suite of related problems coming from the Mizar proof assistant, showing that it is capable to find new and different proofs.

AIJun 14, 2016
DeepMath - Deep Sequence Models for Premise Selection

Alex A. Alemi, Francois Chollet, Niklas Een et al.

We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.

LOMay 23, 2016
Extracting Higher-Order Goals from the Mizar Mathematical Library

Chad Brown, Josef Urban

Certain constructs allowed in Mizar articles cannot be represented in first-order logic but can be represented in higher-order logic. We describe a way to obtain higher-order theorem proving problems from Mizar articles that make use of these constructs. In particular, higher-order logic is used to represent schemes, a global choice construct and set level binders. The higher-order automated theorem provers Satallax and LEO-II have been run on collections of these problems and the results are discussed.

LOOct 20, 2014
Certified Connection Tableaux Proofs for HOL Light and TPTP

Cezary Kaliszyk, Josef Urban, Jiri Vyskocil

In the recent years, the Metis prover based on ordered paramodulation and model elimination has replaced the earlier built-in methods for general-purpose proof automation in HOL4 and Isabelle/HOL. In the annual CASC competition, the leanCoP system based on connection tableaux has however performed better than Metis. In this paper we show how the leanCoP's core algorithm can be implemented inside HOLLight. leanCoP's flagship feature, namely its minimalistic core, results in a very simple proof system. This plays a crucial role in extending the MESON proof reconstruction mechanism to connection tableaux proofs, providing an implementation of leanCoP that certifies its proofs. We discuss the differences between our direct implementation using an explicit Prolog stack, to the continuation passing implementation of MESON present in HOLLight and compare their performance on all core HOLLight goals. The resulting prover can be also used as a general purpose TPTP prover. We compare its performance against the resolution based Metis on TPTP and other interesting datasets.

LOOct 20, 2014
Machine Learning of Coq Proof Guidance: First Experiments

Cezary Kaliszyk, Lionel Mamane, Josef Urban

We report the results of the first experiments with learning proof dependencies from the formalizations done with the Coq system. We explain the process of obtaining the dependencies from the Coq proofs, the characterization of formulas that is used for the learning, and the evaluation method. Various machine learning methods are compared on a dataset of 5021 toplevel Coq proofs coming from the CoRN repository. The best resulting method covers on average 75% of the needed proof dependencies among the first 100 predictions, which is a comparable performance of such initial experiments on other large-theory corpora.

AIMay 14, 2014
Developing Corpus-based Translation Methods between Informal and Formal Mathematics: Project Description

Cezary Kaliszyk, Josef Urban, Jiri Vyskocil et al.

The goal of this project is to (i) accumulate annotated informal/formal mathematical corpora suitable for training semi-automated translation between informal and formal mathematics by statistical machine-translation methods, (ii) to develop such methods oriented at the formalization task, and in particular (iii) to combine such methods with learning-assisted automated reasoning that will serve as a strong semantic component. We describe these ideas, the initial set of corpora, and some initial experiments done over them.

LGFeb 11, 2014
Machine Learner for Automated Reasoning 0.4 and 0.5

Cezary Kaliszyk, Josef Urban, Jiří Vyskočil

Machine Learner for Automated Reasoning (MaLARea) is a learning and reasoning system for proving in large formal libraries where thousands of theorems are available when attacking a new conjecture, and a large number of related problems and proofs can be used to learn specific theorem-proving knowledge. The last version of the system has by a large margin won the 2013 CASC LTB competition. This paper describes the motivation behind the methods used in MaLARea, discusses the general approach and the issues arising in evaluation of such system, and describes the Mizar@Turing100 and CASC'24 versions of MaLARea.

AIFeb 11, 2014
Learning-assisted Theorem Proving with Millions of Lemmas

Cezary Kaliszyk, Josef Urban

Large formal mathematical libraries consist of millions of atomic inference steps that give rise to a corresponding number of proved statements (lemmas). Analogously to the informal mathematical practice, only a tiny fraction of such statements is named and re-used in later proofs by formal mathematicians. In this work, we suggest and implement criteria defining the estimated usefulness of the HOL Light lemmas for proving further theorems. We use these criteria to mine the large inference graph of the lemmas in the HOL Light and Flyspeck libraries, adding up to millions of the best lemmas to the pool of statements that can be re-used in later proofs. We show that in combination with learning-based relevance filtering, such methods significantly strengthen automated theorem proving of new conjectures over large formal mathematical libraries such as Flyspeck.

AIOct 10, 2013
MizAR 40 for Mizar 40

Cezary Kaliszyk, Josef Urban

As a present to Mizar on its 40th anniversary, we develop an AI/ATP system that in 30 seconds of real time on a 14-CPU machine automatically proves 40% of the theorems in the latest official version of the Mizar Mathematical Library (MML). This is a considerable improvement over previous performance of large- theory AI/ATP methods measured on the whole MML. To achieve that, a large suite of AI/ATP methods is employed and further developed. We implement the most useful methods efficiently, to scale them to the 150000 formulas in MML. This reduces the training times over the corpus to 1-3 seconds, allowing a simple practical deployment of the methods in the online automated reasoning service for the Mizar users (MizAR).

AIOct 10, 2013
Lemma Mining over HOL Light

Cezary Kaliszyk, Josef Urban

Large formal mathematical libraries consist of millions of atomic inference steps that give rise to a corresponding number of proved statements (lemmas). Analogously to the informal mathematical practice, only a tiny fraction of such statements is named and re-used in later proofs by formal mathematicians. In this work, we suggest and implement criteria defining the estimated usefulness of the HOL Light lemmas for proving further theorems. We use these criteria to mine the large inference graph of all lemmas in the core HOL Light library, adding thousands of the best lemmas to the pool of named statements that can be re-used in later proofs. The usefulness of the new lemmas is then evaluated by comparing the performance of automated proving of the core HOL Light theorems with and without such added lemmas.