Nathan Fulton

AI
h-index6
8papers
404citations
Novelty46%
AI Score42

8 Papers

LGOct 26, 2022Code
Multi-lingual Evaluation of Code Generation Models

Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang et al. · amazon-science, ibm-research

We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles prompts and test cases from the original Python datasets into the corresponding data in the target language. Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings. Furthermore, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for other code-related evaluations such as code insertion, robustness, or summarization tasks. Overall, our benchmarks represents a significant step towards a deeper understanding of language models' code generation abilities. We publicly release our code and datasets at https://github.com/amazon-research/mxeval.

SEMar 23
LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface

Michael Hind, Basel Shbita, Bo Wu et al.

Textual Large Language Models (LLMs) provide a simple and familiar interface: a string of text is used for both input and output. However, the information conveyed to an LLM often has a richer structure and semantics, which is not conveyed in a string. For example, most prompts contain both instructions ("Summarize this paper into a paragraph") and data (the paper to summarize), but these are usually not distinguished when passed to the model. This can lead to model confusion and security risks, such as prompt injection attacks. This work addresses this shortcoming by introducing an LLM-native mark-up language, LLMON (LLM Object Notation, pronounced "Lemon"), that enables the structure and semantic metadata of the text to be communicated in a natural way to an LLM. This information can then be used during model training, model prompting, and inference implementation, leading to improvements in model accuracy, safety, and security. This is analogous to how programming language types can be used for many purposes, such as static checking, code generation, dynamic checking, and IDE highlighting. We discuss the general design requirements of an LLM-native markup language, introduce the LLMON markup language and show how it meets these design requirements, describe how the information contained in a LLMON artifact can benefit model training and inference implementation, and provide some preliminary empirical evidence of its value for both of these use cases. We also discuss broader issues and research opportunities that are enabled with an LLM-native approach.

PLFeb 28, 2024
Constrained Decoding for Fill-in-the-Middle Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars

Daniel Melcer, Nathan Fulton, Sanjay Krishna Gouda et al.

Large Language Models are powerful tools for program synthesis and advanced auto-completion, but come with no guarantee that their output code is syntactically correct. This paper contributes an incremental parser that allows early rejection of syntactically incorrect code, as well as efficient detection of complete programs for fill-in-the-middle (FIM) tasks. We extend the Earley parsing algorithm to allow for left and right quotients of context-free grammars, and develop methods to handle quotienting of several context-sensitive features present in the grammars of many common programming languages. The result of these contributions is an efficient, general, and well-grounded method for left and right quotient parsing. To validate our theoretical contributions -- and the effectiveness of certain design decisions -- we evaluate our method on the particularly difficult case of FIM completion for Python 3, with syntax-correctness constraints. Our results demonstrate that constrained generation can significantly reduce the incidence of syntax errors in recommended code.

CRJun 3, 2021
Relational Analysis of Sensor Attacks on Cyber-Physical Systems

Jian Xiang, Nathan Fulton, Stephen Chong

Cyber-physical systems, such as self-driving cars or autonomous aircraft, must defend against attacks that target sensor hardware. Analyzing system design can help engineers understand how a compromised sensor could impact the system's behavior; however, designing security analyses for cyber-physical systems is difficult due to their combination of discrete dynamics, continuous dynamics, and nondeterminism. This paper contributes a framework for modeling and analyzing sensor attacks on cyber-physical systems, using the formalism of hybrid programs. We formalize and analyze two relational properties of a system's robustness. These relational properties respectively express (1) whether a system's safety property can be influenced by sensor attacks, and (2) whether a system's high-integrity state can be affected by sensor attacks. We characterize these relational properties by defining an equivalence relation between a system under attack and the original unattacked system. That is, the system satisfies the robustness properties if executions of the attacked system are appropriately related to executions of the unattacked system. We present two techniques for reasoning about the equivalence relation and thus proving the relational properties for a system. One proof technique decomposes large proof obligations to smaller proof obligations. The other proof technique adapts the self-composition technique from the literature on secure information-flow, allowing us to reduce reasoning about the equivalence of two systems to reasoning about properties of a single system. This technique allows us to reuse existing tools for reasoning about properties of hybrid programs, but is challenging due to the combination of discrete dynamics, continuous dynamics, and nondeterminism. To evaluate, we present three case studies motivated by real design flaws in existing cyber-physical systems.

AISep 23, 2020
CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq

Koundinya Vajjha, Avraham Shinnar, Vasily Pestun et al.

Reinforcement learning algorithms solve sequential decision-making problems in probabilistic environments by optimizing for long-term reward. The desire to use reinforcement learning in safety-critical settings inspires a recent line of work on formally constrained reinforcement learning; however, these methods place the implementation of the learning algorithm in their Trusted Computing Base. The crucial correctness property of these implementations is a guarantee that the learning algorithm converges to an optimal policy. This paper begins the work of closing this gap by developing a Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes. The central results are a formalization of Bellman's optimality principle and its proof, which uses a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit. The CertRL development exemplifies how the Giry monad and mechanized metric coinduction streamline optimality proofs for reinforcement learning algorithms. The CertRL library provides a general framework for proving properties about Markov decision processes and reinforcement learning algorithms, paving the way for further work on formalization of reinforcement learning algorithms.

AIJul 2, 2020
Verifiably Safe Exploration for End-to-End Reinforcement Learning

Nathan Hunt, Nathan Fulton, Sara Magliacane et al.

Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We also prove that our method of enforcing the safety constraints preserves all safe policies from the original environment.

SEJun 15, 2020
Formal Verification of End-to-End Learning in Cyber-Physical Systems: Progress and Challenges

Nathan Fulton, Nathan Hunt, Nghia Hoang et al.

Autonomous systems -- such as self-driving cars, autonomous drones, and automated trains -- must come with strong safety guarantees. Over the past decade, techniques based on formal methods have enjoyed some success in providing strong correctness guarantees for large software systems including operating system kernels, cryptographic protocols, and control software for drones. These successes suggest it might be possible to ensure the safety of autonomous systems by constructing formal, computer-checked correctness proofs. This paper identifies three assumptions underlying existing formal verification techniques, explains how each of these assumptions limits the applicability of verification in autonomous systems, and summarizes preliminary work toward improving the strength of evidence provided by formal verification.

AIFeb 14, 2019
Verifiably Safe Off-Model Reinforcement Learning

Nathan Fulton, Andre Platzer

The desire to use reinforcement learning in safety-critical settings has inspired a recent interest in formal methods for learning algorithms. Existing formal methods for learning and optimization primarily consider the problem of constrained learning or constrained optimization. Given a single correct model and associated safety constraint, these approaches guarantee efficient learning while provably avoiding behaviors outside the safety constraint. Acting well given an accurate environmental model is an important pre-requisite for safe learning, but is ultimately insufficient for systems that operate in complex heterogeneous environments. This paper introduces verification-preserving model updates, the first approach toward obtaining formal safety guarantees for reinforcement learning in settings where multiple environmental models must be taken into account. Through a combination of design-time model updates and runtime model falsification, we provide a first approach toward obtaining formal safety proofs for autonomous systems acting in heterogeneous environments.