PLMar 15, 2023
Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for CodeBrando Miranda, Avi Shinnar, Vasily Pestun et al.
Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi. This is an interesting area of inquiry for two reasons. First, typed lambda calculi are the lingua franc of programming languages. A set of heuristics that relate various typed lambda calculi to effective neural architectures would provide a systematic method for mapping language features (e.g., polymorphism, subtyping, inheritance, etc.) to architecture choices. Second, transformer models are widely used in deep learning architectures applied to code, but the design and hyperparameter space for them is large and relatively unexplored in programming language applications. Therefore, we suggest a benchmark that allows us to explore exactly this through perhaps the simplest and most fundamental property of a programming language: the relationship between terms and types. Consequently, we begin this inquiry of transformer architectures for typed lambda calculi by exploring the effect of transformer warm-up and optimizer selection in the task of type inference: i.e., predicting the types of lambda calculus terms using only transformers. We find that the optimization landscape is difficult even in this simple setting. One particular experimental finding is that optimization by Adafactor converges much faster compared to the optimization by Adam and RAdam. We conjecture that such different performance of optimizers might be related to the difficulties of generalization over formally generated dataset.
LGJan 5, 2024
Graph2Tac: Online Representation Learning of Formal Math ConceptsLasse Blaauwbroek, Miroslav Olšák, Jason Rute et al.
In proof assistants, the physical proximity between two formal mathematical concepts is a strong predictor of their mutual relevance. Furthermore, lemmas with close proximity regularly exhibit similar proof structures. We show that this locality property can be exploited through online learning techniques to obtain solving agents that far surpass offline learners when asked to prove theorems in an unseen mathematical setting. We extensively benchmark two such online solvers implemented in the Tactician platform for the Coq proof assistant: First, Tactician's online $k$-nearest neighbor solver, which can learn from recent proofs, shows a $1.72\times$ improvement in theorems proved over an offline equivalent. Second, we introduce a graph neural network, Graph2Tac, with a novel approach to build hierarchical representations for new definitions. Graph2Tac's online definition task realizes a $1.5\times$ improvement in theorems solved over an offline baseline. The $k$-NN and Graph2Tac solvers rely on orthogonal online data, making them highly complementary. Their combination improves $1.27\times$ over their individual performances. Both solvers outperform all other general-purpose provers for Coq, including CoqHammer, Proverbot9001, and a transformer baseline by at least $1.48\times$ and are available for practical use by end-users.
LOFeb 12, 2022
Formalization of a Stochastic Approximation TheoremKoundinya Vajjha, Barry Trager, Avraham Shinnar et al.
Stochastic approximation algorithms are iterative procedures which are used to approximate a target value in an environment where the target is unknown and direct observations are corrupted by noise. These algorithms are useful, for instance, for root-finding and function minimization when the target function or model is not directly known. Originally introduced in a 1951 paper by Robbins and Monro, the field of Stochastic approximation has grown enormously and has come to influence application domains from adaptive signal processing to artificial intelligence. As an example, the Stochastic Gradient Descent algorithm which is ubiquitous in various subdomains of Machine Learning is based on stochastic approximation theory. In this paper, we give a formal proof (in the Coq proof assistant) of a general convergence theorem due to Aryeh Dvoretzky, which implies the convergence of important classical methods such as the Robbins-Monro and the Kiefer-Wolfowitz algorithms. In the process, we build a comprehensive Coq library of measure-theoretic probability theory and stochastic processes.
AISep 23, 2020
CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in CoqKoundinya Vajjha, Avraham Shinnar, Vasily Pestun et al.
Reinforcement learning algorithms solve sequential decision-making problems in probabilistic environments by optimizing for long-term reward. The desire to use reinforcement learning in safety-critical settings inspires a recent line of work on formally constrained reinforcement learning; however, these methods place the implementation of the learning algorithm in their Trusted Computing Base. The crucial correctness property of these implementations is a guarantee that the learning algorithm converges to an optimal policy. This paper begins the work of closing this gap by developing a Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes. The central results are a formalization of Bellman's optimality principle and its proof, which uses a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit. The CertRL development exemplifies how the Giry monad and mechanized metric coinduction streamline optimality proofs for reinforcement learning algorithms. The CertRL library provides a general framework for proving properties about Markov decision processes and reinforcement learning algorithms, paving the way for further work on formalization of reinforcement learning algorithms.
CLNov 4, 2017
Language as a matrix product stateVasily Pestun, John Terilla, Yiannis Vlassopoulos
We propose a statistical model for natural language that begins by considering language as a monoid, then representing it in complex matrices with a compatible translation invariant probability measure. We interpret the probability measure as arising via the Born rule from a translation invariant matrix product state.
CLOct 27, 2017
Tensor network language modelVasily Pestun, Yiannis Vlassopoulos
We propose a new statistical model suitable for machine learning of systems with long distance correlations such as natural languages. The model is based on directed acyclic graph decorated by multi-linear tensor maps in the vertices and vector spaces in the edges, called tensor network. Such tensor networks have been previously employed for effective numerical computation of the renormalization group flow on the space of effective quantum field theories and lattice models of statistical mechanics. We provide explicit algebro-geometric analysis of the parameter moduli space for tree graphs, discuss model properties and applications such as statistical translation.