Andrew D. Gordon

PL
h-index17
10papers
352citations
Novelty36%
AI Score37

10 Papers

HCOct 2, 2023
Co-audit: tools to help humans double-check AI-generated content

Andrew D. Gordon, Carina Negreanu, José Cambronero et al. · microsoft-research

Users are increasingly being warned to check AI-generated content for correctness. Still, as LLMs (and other generative models) generate more complex output, such as summaries, tables, or code, it becomes harder for the user to audit or evaluate the output for quality or correctness. Hence, we are seeing the emergence of tool-assisted experiences to help the user double-check a piece of AI-generated content. We refer to these as co-audit tools. Co-audit tools complement prompt engineering techniques: one helps the user construct the input prompt, while the other helps them check the output response. As a specific example, this paper describes recent research on co-audit tools for spreadsheet computations powered by generative models. We explain why co-audit experiences are essential for any application of generative AI where quality is important and errors are consequential (as is common in spreadsheet computations). We propose a preliminary list of principles for co-audit, and outline research challenges.

HCAug 12, 2022
What is it like to program with artificial intelligence?

Advait Sarkar, Andrew D. Gordon, Carina Negreanu et al.

Large language models, such as OpenAI's codex and Deepmind's AlphaCode, can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot. In this paper, we explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance. We draw upon publicly available experience reports of LLM-assisted programming, as well as prior usability and design studies. We find that while LLM-assisted programming shares some properties of compilation, pair programming, and programming via search and reuse, there are fundamental differences both in the technical possibilities as well as the practical experience. Thus, LLM-assisted programming ought to be viewed as a new way of programming with its own distinct properties and challenges. Finally, we draw upon observations from a user study in which non-expert end user programmers use LLM-assisted tools for solving data tasks in spreadsheets. We discuss the issues that might arise, and open research challenges, in applying large language models to end-user programming, particularly with users who have little or no programming expertise.

PLFeb 23
The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Zac Garby, Andrew D. Gordon, David Sands

A conversation with a large language model (LLM) is a sequence of prompts and responses, with each response generated from the preceding conversation. AI agents build such conversations automatically: given an initial human prompt, a planner loop interleaves LLM calls with tool invocations and code execution. This tight coupling creates a new and poorly understood attack surface. A malicious prompt injected into a conversation can compromise later reasoning, trigger dangerous tool calls, or distort final outputs. Despite the centrality of such systems, we currently lack a principled semantic foundation for reasoning about their behaviour and safety. We address this gap by introducing an untyped call-by-value lambda calculus enriched with dynamic information-flow control and a small number of primitives for constructing prompt-response conversations. Our language includes a primitive that invokes an LLM: it serializes a value, sends it to the model as a prompt, and parses the response as a new term. This calculus faithfully represents planner loops and their vulnerabilities, including the mechanisms by which prompt injection alters subsequent computation. The semantics explicitly captures conversations, and so supports reasoning about defenses such as quarantined sub-conversations, isolation of generated code, and information-flow restrictions on what may influence an LLM call. A termination-insensitive noninterference theorem establishes integrity and confidentiality guarantees, demonstrating that a formal calculus can provide rigorous foundations for safe agentic programming.

PLFeb 18, 2024
Solving Data-centric Tasks using Large Language Models

Shraddha Barke, Christian Poelitz, Carina Suzana Negreanu et al.

Large language models (LLMs) are rapidly replacing help forums like StackOverflow, and are especially helpful for non-professional programmers and end users. These users are often interested in data-centric tasks, such as spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including the data. But how do we decide how much data and which data to include in the prompt? This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table, our cluster-then-select technique outperforms a random selection baseline.

PLOct 22, 2020
Conditional independence by typing

Maria I. Gorinova, Andrew D. Gordon, Charles Sutton et al.

A central goal of probabilistic programming languages (PPLs) is to separate modelling from inference. However, this goal is hard to achieve in practice. Users are often forced to re-write their models in order to improve efficiency of inference or meet restrictions imposed by the PPL. Conditional independence (CI) relationships among parameters are a crucial aspect of probabilistic models that capture a qualitative summary of the specified model and can facilitate more efficient inference. We present an information flow type system for probabilistic programming that captures conditional independence (CI) relationships, and show that, for a well-typed program in our system, the distribution it implements is guaranteed to have certain CI-relationships. Further, by using type inference, we can statically deduce which CI-properties are present in a specified model. As a practical application, we consider the problem of how to perform inference on models with mixed discrete and continuous parameters. Inference on such models is challenging in many existing PPLs, but can be improved through a workaround, where the discrete parameters are used implicitly, at the expense of manual model re-writing. We present a source-to-source semantics-preserving transformation, which uses our CI-type system to automate this workaround by eliminating the discrete parameters from a probabilistic program. The resulting program can be seen as a hybrid inference algorithm on the original program, where continuous parameters can be drawn using efficient gradient-based inference methods, while the discrete parameters are inferred using variable elimination. We implement our CI-type system and its example application in SlicStan: a compositional variant of Stan.

PLApr 1, 2020
OptTyper: Probabilistic Type Inference by Optimising Logical and Natural Constraints

Irene Vlassi Pandi, Earl T. Barr, Andrew D. Gordon et al.

We present a new approach to the type inference problem for dynamic languages. Our goal is to combine \emph{logical} constraints, that is, deterministic information from a type system, with \emph{natural} constraints, that is, uncertain statistical information about types learnt from sources like identifier names. To this end, we introduce a framework for probabilistic type inference that combines logic and learning: logical constraints on the types are extracted from the program, and deep learning is applied to predict types from surface-level code properties that are statistically associated. The foremost insight of our method is to constrain the predictions from the learning procedure to respect the logical constraints, which we achieve by relaxing the logical inference problem of type prediction into a continuous optimisation problem. We build a tool called OptTyper to predict missing types for TypeScript files. OptTyper combines a continuous interpretation of logical constraints derived by classical static analysis of TypeScript code, with natural constraints obtained from a deep learning model, which learns naming conventions for types from a large codebase. By evaluating OptTyper, we show that the combination of logical and natural constraints yields a large improvement in performance over either kind of information individually and achieves a 4% improvement over the state-of-the-art.

HCMay 30, 2019
Somewhere Around That Number: An Interview Study of How Spreadsheet Users Manage Uncertainty

Judith Borghouts, Andrew D. Gordon, Advait Sarkar et al.

Spreadsheet users regularly deal with uncertainty in their data, for example due to errors and estimates. While an insight into data uncertainty can help in making better informed decisions, prior research suggests that people often use informal heuristics to reason with probabilities, which leads to incorrect conclusions. Moreover, people often ignore or simplify uncertainty. To understand how people currently encounter and deal with uncertainty in spreadsheets, we conducted an interview study with 11 spreadsheet users from a range of domains. We found that how people deal with uncertainty is influenced by the role the spreadsheet plays in people's work and the user's aims. Spreadsheets are used as a database, template, calculation tool, notepad and exploration tool. In doing so, participants' aims were to compute and compare different scenarios, understand something about the nature of the uncertainty in their situation, and translate the complexity of data uncertainty into simplified presentations to other people, usually decision-makers. Spreadsheets currently provide limited tools to support these aims, and participants had various workarounds.

PLNov 2, 2018
Probabilistic Programming with Densities in SlicStan: Efficient, Flexible and Deterministic

Maria I. Gorinova, Andrew D. Gordon, Charles Sutton

Stan is a probabilistic programming language that has been increasingly used for real-world scalable projects. However, to make practical inference possible, the language sacrifices some of its usability by adopting a block syntax, which lacks compositionality and flexible user-defined functions. Moreover, the semantics of the language has been mainly given in terms of intuition about implementation, and has not been formalised. This paper provides a formal treatment of the Stan language, and introduces the probabilistic programming language SlicStan --- a compositional, self-optimising version of Stan. Our main contributions are: (1) the formalisation of a core subset of Stan through an operational density-based semantics; (2) the design and semantics of the Stan-like language SlicStan, which facilities better code reuse and abstraction through its compositional syntax, more flexible functions, and information-flow type system; and (3) a formal, semantic-preserving procedure for translating SlicStan to Stan.

PLApr 4, 2017
Deriving Probability Density Functions from Probabilistic Functional Programs

Sooraj Bhat, Johannes Borgström, Andrew D. Gordon et al.

The probability density function of a probability distribution is a fundamental concept in probability theory and a key ingredient in various widely used machine learning methods. However, the necessary framework for compiling probabilistic functional programs to density functions has only recently been developed. In this work, we present a density compiler for a probabilistic language with failure and both discrete and continuous distributions, and provide a proof of its soundness. The compiler greatly reduces the development effort of domain experts, which we demonstrate by solving inference problems from various scientific applications, such as modelling the global carbon cycle, using a standard Markov chain Monte Carlo framework.

CRDec 23, 2013
Guiding a General-Purpose C Verifier to Prove Cryptographic Protocols

François Dupressoir, Andrew D. Gordon, Jan Jürjens et al.

We describe how to verify security properties of C code for cryptographic protocols by using a general-purpose verifier. We prove security theorems in the symbolic model of cryptography. Our techniques include: use of ghost state to attach formal algebraic terms to concrete byte arrays and to detect collisions when two distinct terms map to the same byte array; decoration of a crypto API with contracts based on symbolic terms; and expression of the attacker model in terms of C programs. We rely on the general-purpose verifier VCC; we guide VCC to prove security simply by writing suitable header files and annotations in implementation files, rather than by changing VCC itself. We formalize the symbolic model in Coq in order to justify the addition of axioms to VCC.