Andrej Bauer

2papers

2 Papers

82.9LOApr 28
Sheaves as oracle computations

Danel Ahman, Andrej Bauer

In type theory, an oracle may be specified abstractly by a predicate whose domain is the type of queries asked of the oracle, and whose proofs are the oracle answers. Such a specification induces an oracle modality that captures a computational intuition about oracles: at each step of reasoning we either know the result, or we ask the oracle a query and proceed upon receiving an answer. We characterize an oracle modality as the least one forcing the given predicate. We establish an adjoint retraction between modalities and propositional containers, from which it follows that every modality is an oracle modality. The left adjoint maps sums to suprema, which makes suprema of modalities easy to compute when they are given in terms of oracle modalities. We also study sheaves for oracle modalities. We describe sheafification in terms of a quotient-inductive type of computation trees, and describe sheaves as algebras for the corresponding monad. We also introduce equifoliate trees, an intensional notion of oracle computation given by a (non-propositional) container. Equifoliate trees descend to sheaves, and modally cover them. As an application, we give a concrete description of all Lawvere-Tierney topologies in a realizability topos, closely related to a game-theoretic characterization by Takayuki Kihara.

LGOct 24, 2023
MLFMF: Data Sets for Machine Learning for Mathematical Formalization

Andrej Bauer, Matej Petković, Ljupčo Todorovski

We introduce MLFMF, a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes the largest Lean~4 library Mathlib, and some of the largest Agda libraries: the standard library, the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of s-expressions representing the syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the s-expressions give complete and easily parsed information about every entry. We report baseline results using standard graph and word embeddings, tree ensembles, and instance-based learning algorithms. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. The methodology used to extract the networks and the s-expressions readily applies to other libraries, and is applicable to other proof assistants. With more than $250\,000$ entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format.