AIJul 13, 2023
Reinforcement Learning and Data-Generation for Syntax-Guided SynthesisJulian Parsert, Elizabeth Polgreen
Program synthesis is the task of automatically generating code based on a specification. In Syntax-Guided Synthesis (SyGuS) this specification is a combination of a syntactic template and a logical formula, and the result is guaranteed to satisfy both. We present a reinforcement-learning guided algorithm for SyGuS which uses Monte-Carlo Tree Search (MCTS) to search the space of candidate solutions. Our algorithm learns policy and value functions which, combined with the upper confidence bound for trees, allow it to balance exploration and exploitation. A common challenge in applying machine learning approaches to syntax-guided synthesis is the scarcity of training data. To address this, we present a method for automatically generating training data for SyGuS based on anti-unification of existing first-order satisfiability problems, which we use to train our MCTS policy. We implement and evaluate this setup and demonstrate that learned policy and value improve the synthesis performance over a baseline by over 26 percentage points in the training and testing sets. Our tool outperforms state-of-the-art tool cvc5 on the training set and performs comparably in terms of the total number of problems solved on the testing set (solving 23% of the benchmarks on which cvc5 fails). We make our data set publicly available, to enable further application of machine learning methods to the SyGuS problem.
AIMar 6, 2024
Guiding Enumerative Program Synthesis with Large Language ModelsYixuan Li, Julian Parsert, Elizabeth Polgreen
Pre-trained Large Language Models (LLMs) are beginning to dominate the discourse around automatic code generation with natural language specifications. In contrast, the best-performing synthesizers in the domain of formal synthesis with precise logical specifications are still based on enumerative algorithms. In this paper, we evaluate the abilities of LLMs to solve formal synthesis benchmarks by carefully crafting a library of prompts for the domain. When one-shot synthesis fails, we propose a novel enumerative synthesis algorithm, which integrates calls to an LLM into a weighted probabilistic search. This allows the synthesizer to provide the LLM with information about the progress of the enumerator, and the LLM to provide the enumerator with syntactic guidance in an iterative loop. We evaluate our techniques on benchmarks from the Syntax-Guided Synthesis (SyGuS) competition. We find that GPT-3.5 as a stand-alone tool for formal synthesis is easily outperformed by state-of-the-art formal synthesis algorithms, but our approach integrating the LLM into an enumerative synthesis algorithm shows significant performance gains over both the LLM and the enumerative synthesizer alone and the winning SyGuS competition tool.
LOApr 8
One is all you need: Second-order Unification without First-order VariablesDavid M. Cerna, Julian Parsert
We introduce a fragment of second-order unification, referred to as \emph{Second-Order Ground Unification (SOGU)}, with the following properties: (i) only one second-order variable is allowed, and (ii) first-order variables do not occur. We study an equational variant of SOGU where the signature contains \textit{associative} binary function symbols (ASOGU) and show that Hilbert's 10$^{th}$ problem is reducible to ASOGU unifiability, thus proving undecidability. Our reduction provides a new lower bound for the undecidability of second-order unification, as previous results required first-order variable occurrences, multiple second-order variables, and/or equational theories involving \textit{length-reducing} rewrite systems. Furthermore, our reduction holds even in the case when associativity of the binary function symbol is restricted to \emph{power associative}, i.e. f(f(x,x),x)= f(x,f(x,x)), as our construction requires a single constant.
AINov 24, 2025
Extracting Robust Register Automata from Neural Networks over Data SequencesChih-Duo Hong, Hongjian Jiang, Anthony W. Lin et al.
Automata extraction is a method for synthesising interpretable surrogates for black-box neural models that can be analysed symbolically. Existing techniques assume a finite input alphabet, and thus are not directly applicable to data sequences drawn from continuous domains. We address this challenge with deterministic register automata (DRAs), which extend finite automata with registers that store and compare numeric values. Our main contribution is a framework for robust DRA extraction from black-box models: we develop a polynomial-time robustness checker for DRAs with a fixed number of registers, and combine it with passive and active automata learning algorithms. This combination yields surrogate DRAs with statistical robustness and equivalence guarantees. As a key application, we use the extracted automata to assess the robustness of neural networks: for a given sequence and distance metric, the DRA either certifies local robustness or produces a concrete counterexample. Experiments on recurrent neural networks and transformer architectures show that our framework reliably learns accurate automata and enables principled robustness evaluation. Overall, our results demonstrate that robust DRA extraction effectively bridges neural network interpretability and formal reasoning without requiring white-box access to the underlying network.
LGFeb 7, 2021
Neural Termination AnalysisMirco Giacobbe, Daniel Kroening, Julian Parsert
We introduce a novel approach to the automated termination analysis of computer programs: we use neural networks to represent ranking functions. Ranking functions map program states to values that are bounded from below and decrease as a program runs; the existence of a ranking function proves that the program terminates. We train a neural network from sampled execution traces of a program so that the network's output decreases along the traces; then, we use symbolic reasoning to formally verify that it generalises to all possible executions. Upon the affirmative answer we obtain a formal certificate of termination for the program, which we call a neural ranking function. We demonstrate that thanks to the ability of neural networks to represent nonlinear functions our method succeeds over programs that are beyond the reach of state-of-the-art tools. This includes programs that use disjunctions in their loop conditions and programs that include nonlinear expressions.
LGJan 22, 2021
A Study of Continuous Vector Representationsfor Theorem ProvingStanisław Purgał, Julian Parsert, Cezary Kaliszyk
Applying machine learning to mathematical terms and formulas requires a suitable representation of formulas that is adequate for AI methods. In this paper, we develop an encoding that allows for logical properties to be preserved and is additionally reversible. This means that the tree shape of a formula including all symbols can be reconstructed from the dense vector representation. We do that by training two decoders: one that extracts the top symbol of the tree and one that extracts embedding vectors of subtrees. The syntactic and semantic logical properties that we aim to reserve include both structural formula properties, applicability of natural deduction steps, and even more complex operations like unifiability. We propose datasets that can be used to train these syntactic and semantic properties. We evaluate the viability of the developed encoding across the proposed datasets as well as for the practical theorem proving problem of premise selection in the Mizar corpus.