David Wingate

CL
h-index10
26papers
2,963citations
Novelty47%
AI Score48

26 Papers

LGSep 14, 2022
Out of One, Many: Using Language Models to Simulate Human Samples

Lisa P. Argyle, Ethan C. Busby, Nancy Fulda et al. · uw

We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the "algorithmic bias" within one such tool -- the GPT-3 language model -- is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property "algorithmic fidelity" and explore its extent in GPT-3. We create "silicon samples" by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.

CLOct 6, 2022
Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models

David Wingate, Mohammad Shoeybi, Taylor Sorensen · uw

We explore the idea of compressing the prompts used to condition language models, and show that compressed prompts can retain a substantive amount of information about the original prompt. For severely compressed prompts, while fine-grained information is lost, abstract information and general sentiments can be retained with surprisingly few parameters, which can be useful in the context of decode-time algorithms for controllability and toxicity reduction. We explore contrastive conditioning to steer language model generation towards desirable text and away from undesirable text, and find that some complex prompts can be effectively compressed into a single token to guide generation. We also show that compressed prompts are largely compositional, and can be constructed such that they can be used to control independent aspects of generated text.

CLOct 22, 2022
Leveraging Large Language Models for Multiple Choice Question Answering

Joshua Robinson, Christopher Michael Rytting, David Wingate · uw

While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., "A") associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective, the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated.

CLMar 21, 2022
An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels

Taylor Sorensen, Joshua Robinson, Christopher Michael Rytting et al. · uw

Pre-trained language models derive substantial linguistic and factual knowledge from the massive corpora on which they are trained, and prompt engineering seeks to align these models to specific tasks. Unfortunately, existing prompt engineering methods require significant amounts of labeled data, access to model parameters, or both. We introduce a new method for selecting prompt templates \textit{without labeled examples} and \textit{without direct access to the model}. Specifically, over a set of candidate templates, we choose the template that maximizes the mutual information between the input and the corresponding model output. Across 8 datasets representing 7 distinct NLP tasks, we show that when a template has high mutual information, it also has high accuracy on the task. On the largest model, selecting prompts with our method gets 90\% of the way from the average prompt accuracy to the best prompt accuracy and requires no ground truth labels.

88.0LGMay 23
Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making

David Wingate, Sheryl Carty, Joshua Coates et al.

As large language models become a default source of guidance on personal, moral, and existential questions, it matters whether they draw on the religious frameworks that have historically shaped such reasoning, or systematically omit them. In this paper, we ask a deliberately narrow question: when posed an everyday ethical question for which religious perspectives may be valuable, do LLMs invoke religion at all? In contrast to benchmarks that look for the presence of political leanings or social bias, we look for the absence of religious representation as a dimension of value alignment and bias in LLMs. We term this ``omissive bias.'' To measure omissive bias, we contribute the AllFaith Religious Representation Benchmark: 150 ethically and personally salient questions, sourced from in-the-wild chat transcripts and faith-community contributors, paired with an LLM-as-judge rubric that gives full credit for any mention of a religion, a religious practice, or a religious leader. The questions are not themselves about religion--they are open-ended questions about grief, forgiveness, relationships, purpose, and honesty, where religion is one valuable perspective among several. We also run a human-subjects survey to compare LLM behavior against human expectations. Evaluating 27 models, we find that LLMs consistently underrepresent religion relative to human expectations. The omission is asymmetric: models invoke religion more readily for abstract existential questions (meaning, death, truth) than for the practical personal situations--grief, marriage, family conflict, addiction--where many people most rely on it. It is not our purpose to adjudicate which values LLMs should hold. We argue, more modestly, that current LLM responses overlook critical opportunities to reflect religious frameworks that many people draw on when navigating personal and ethical challenges.

HCFeb 14, 2023
AI Chat Assistants can Improve Conversations about Divisive Topics

Lisa P. Argyle, Ethan Busby, Joshua Gubler et al. · uw

A rapidly increasing amount of human conversation occurs online. But divisiveness and conflict can fester in text-based interactions on social media platforms, in messaging apps, and on other digital forums. Such toxicity increases polarization and, importantly, corrodes the capacity of diverse societies to develop efficient solutions to complex social problems that impact everyone. Scholars and civil society groups promote interventions that can make interpersonal conversations less divisive or more productive in offline settings, but scaling these efforts to the amount of discourse that occurs online is extremely challenging. We present results of a large-scale experiment that demonstrates how online conversations about divisive topics can be improved with artificial intelligence tools. Specifically, we employ a large language model to make real-time, evidence-based recommendations intended to improve participants' perception of feeling understood in conversations. We find that these interventions improve the reported quality of the conversation, reduce political divisiveness, and improve the tone, without systematically changing the content of the conversation or moving people's policy attitudes. These findings have important implications for future research on social media, political deliberation, and the growing community of scholars interested in the place of artificial intelligence within computational social science.

52.4CLMay 19
Language models struggle with compartmentalization

Thomas Vincent Howe, David Wingate

In the training data used by large language models (LLMs), the same latent concept is often presented in multiple distinct ways: the same facts appear in English and Swahili; many functions can be expressed in both Python and Haskell; we can express propositions in both formal and natural language. We show that LLMs can exhibit compartmentalization, where they fail to identify and share statistical strength between distinct presentations of unified concepts. In the worst case, LLMs simply learn parallel internal representations of each presentation of the concept, saturating model capacity with redundancies and decreasing sample efficiency with the number of such presentations. We also demonstrate that synthetic parallel data can fail to improve this despite being easily learned itself. Under this framework, we find that, for small models, early multilingual learning is nearly entirely compartmentalized. Finally, all interventions that we study exhibit a phase transition in which their effectiveness depends on the number of distinct presentations, suggesting that the language modeling objective may only inconsistently unify representations.

AIJun 3, 2023
Towards Coding Social Science Datasets with Language Models

Christopher Michael Rytting, Taylor Sorensen, Lisa Argyle et al.

Researchers often rely on humans to code (label, annotate, etc.) large sets of texts. This kind of human coding forms an important part of social science research, yet the coding process is both resource intensive and highly variable from application to application. In some cases, efforts to automate this process have achieved human-level accuracies, but to achieve this, these attempts frequently rely on thousands of hand-labeled training examples, which makes them inapplicable to small-scale research studies and costly for large ones. Recent advances in a specific kind of artificial intelligence tool - language models (LMs) - provide a solution to this problem. Work in computer science makes it clear that LMs are able to classify text, without the cost (in financial terms and human effort) of alternative methods. To demonstrate the possibilities of LMs in this area of political science, we use GPT-3, one of the most advanced LMs, as a synthetic coder and compare it to human coders. We find that GPT-3 can match the performance of typical human coders and offers benefits over other machine learning methods of coding text. We find this across a variety of domains using very different coding procedures. This provides exciting evidence that language models can serve as a critical advance in the coding of open-ended texts in a variety of applications.

LGNov 15, 2024
Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning

Jeffrey Olmo, Jared Wilson, Max Forsey et al.

Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations by learning a sparse and overcomplete decomposition of the network's internal activations. However, SAEs are traditionally trained considering only activation values and not the effect those activations have on downstream computations. This limits the information available to learn features, and biases the autoencoder towards neglecting features which are represented with small activation values but strongly influence model outputs. To address this, we introduce Gradient SAEs (g-SAEs), which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function to rely on the gradients of the input activation when selecting the $k$ elements. For a given sparsity level, g-SAEs produce reconstructions that are more faithful to original network performance when propagated through the network. Additionally, we find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts. By considering the downstream effects of activations, our approach leverages the dual nature of neural network features as both $\textit{representations}$, retrospectively, and $\textit{actions}$, prospectively. While previous methods have approached the problem of feature discovery primarily focused on the former aspect, g-SAEs represent a step towards accounting for the latter as well.

CYApr 4, 2025
Arti-"fickle" Intelligence: Using LLMs as a Tool for Inference in the Political and Social Sciences

Lisa P. Argyle, Ethan C. Busby, Joshua R. Gubler et al.

Generative large language models (LLMs) are incredibly useful, versatile, and promising tools. However, they will be of most use to political and social science researchers when they are used in a way that advances understanding about real human behaviors and concerns. To promote the scientific use of LLMs, we suggest that researchers in the political and social sciences need to remain focused on the scientific goal of inference. To this end, we discuss the challenges and opportunities related to scientific inference with LLMs, using validation of model output as an illustrative case for discussion. We propose a set of guidelines related to establishing the failure and success of LLMs when completing particular tasks, and discuss how we can make inferences from these observations. We conclude with a discussion of how this refocus will improve the accumulation of shared scientific knowledge about these tools and their uses in the social sciences.

CLOct 5, 2021
Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning

Christopher Michael Rytting, David Wingate

Large natural language models (such as GPT-3 or T5) demonstrate impressive abilities across a range of general NLP tasks. Here, we show that the knowledge embedded in such models provides a useful inductive bias, not just on traditional NLP tasks, but also in the nontraditional task of training a symbolic reasoning engine. We observe that these engines learn quickly and generalize in a natural way that reflects human intuition. For example, training such a system to model block-stacking might naturally generalize to stacking other types of objects because of structure in the real world that has been partially captured by the language describing it. We study several abstract textual reasoning tasks, such as object manipulation and navigation, and demonstrate multiple types of generalization to novel scenarios and the symbols that comprise them. We also demonstrate the surprising utility of \textit{compositional learning}, where a learner dedicated to mastering a complicated task gains an advantage by training on relevant simpler tasks instead of jumping straight to the complicated task.

CLDec 10, 2020
Towards Neural Programming Interfaces

Zachary C. Brown, Nathaniel Robinson, David Wingate et al.

It is notoriously difficult to control the behavior of artificial neural networks such as generative neural language models. We recast the problem of controlling natural language generation as that of learning to interface with a pretrained language model, just as Application Programming Interfaces (APIs) control the behavior of programs by altering hyperparameters. In this new paradigm, a specialized neural network (called a Neural Programming Interface or NPI) learns to interface with a pretrained language model by manipulating the hidden activations of the pretrained model to produce desired outputs. Importantly, no permanent changes are made to the weights of the original model, allowing us to re-purpose pretrained models for new tasks without overwriting any aspect of the language model. We also contribute a new data set construction algorithm and GAN-inspired loss function that allows us to train NPI models to control outputs of autoregressive transformers. In experiments against other state-of-the-art approaches, we demonstrate the efficacy of our methods using OpenAI's GPT-2 model, successfully controlling noun selection, topic aversion, offensive speech filtering, and other aspects of language while largely maintaining the controlled model's fluency under deterministic settings.

ROJan 3, 2020
Human-robot co-manipulation of extended objects: Data-driven models and control from analysis of human-human dyads

Erich Mielke, Eric Townsend, David Wingate et al.

Human teams are able to easily perform collaborative manipulation tasks. However, for a robot and human to simultaneously manipulate an extended object is a difficult task using existing methods from the literature. Our approach in this paper is to use data from human-human dyad experiments to determine motion intent which we use for a physical human-robot co-manipulation task. We first present and analyze data from human-human dyads performing co-manipulation tasks. We show that our human-human dyad data has interesting trends including that interaction forces are non-negligible compared to the force required to accelerate an object and that the beginning of a lateral movement is characterized by distinct torque triggers from the leader of the dyad. We also examine different metrics to quantify performance of different dyads. We also develop a deep neural network based on motion data from human-human trials to predict human intent based on past motion. We then show how force and motion data can be used as a basis for robot control in a human-robot dyad. Finally, we compare the performance of two controllers for human-robot co-manipulation to human-human dyad performance.

LGOct 3, 2019
Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning

Kolby Nottingham, Anand Balakrishnan, Jyotirmoy Deshmukh et al.

It is notoriously difficult to control the behavior of reinforcement learning agents. Agents often learn to exploit the environment or reward signal and need to be retrained multiple times. The multi-objective reinforcement learning (MORL) framework separates a reward function into several objectives. An ideal MORL agent learns to generalize to novel combinations of objectives allowing for better control of an agent's behavior without requiring retraining. Many MORL approaches use a weight vector to parameterize the importance of each objective. However, this approach suffers from lack of expressiveness and interpretability. We propose using propositional logic to specify the importance of multiple objectives. By using a logic where predicates correspond directly to objectives, specifications are inherently more interpretable. Additionally the set of specifications that can be expressed with formal languages is a superset of what can be expressed by weight vectors. In this paper, we define a formal language based on propositional logic with quantitative semantics. We encode logical specifications using a recurrent neural network and show that MORL agents parameterized by these encodings are able to generalize to novel specifications over objectives and achieve performance comparable to single objective baselines.

LGOct 1, 2019
Wasserstein Neural Processes

Andrew Carr, Jared Nielsen, David Wingate

Neural Processes (NPs) are a class of models that learn a mapping from a context set of input-output pairs to a distribution over functions. They are traditionally trained using maximum likelihood with a KL divergence regularization term. We show that there are desirable classes of problems where NPs, with this loss, fail to learn any reasonable distribution. We also show that this drawback is solved by using approximations of Wasserstein distance which calculates optimal transport distances even for distributions of disjoint support. We give experimental justification for our method and demonstrate performance. These Wasserstein Neural Processes (WNPs) maintain all of the benefits of traditional NPs while being able to approximate a new class of function mappings.

CVMar 1, 2019
Video Extrapolation with an Invertible Linear Embedding

Robert Pottorff, Jared Nielsen, David Wingate

We predict future video frames from complex dynamic scenes, using an invertible neural network as the encoder of a nonlinear dynamic system with latent linear state evolution. Our invertible linear embedding (ILE) demonstrates successful learning, prediction and latent state inference. In contrast to other approaches, ILE does not use any explicit reconstruction loss or simplistic pixel-space assumptions. Instead, it leverages invertibility to optimize the likelihood of image sequences exactly, albeit indirectly. Comparison with a state-of-the-art method demonstrates the viability of our approach.

LGFeb 26, 2019
Graph Neural Processes: Towards Bayesian Graph Neural Networks

Andrew Carr, David Wingate

We introduce Graph Neural Processes (GNP), inspired by the recent work in conditional and latent neural processes. A Graph Neural Process is defined as a Conditional Neural Process that operates on arbitrary graph data. It takes features of sparsely observed context points as input, and outputs a distribution over target points. We demonstrate graph neural processes in edge imputation and discuss benefits and drawbacks of the method for other application areas. One major benefit of GNPs is the ability to quantify uncertainty in deep learning on graph structures. An additional benefit of this method is the ability to extend graph neural networks to inputs of dynamic sized graphs.

AIDec 4, 2018
Nested Reasoning About Autonomous Agents Using Probabilistic Programs

Iris Rubi Seaman, Jan-Willem van de Meent, David Wingate

As autonomous agents become more ubiquitous, they will eventually have to reason about the plans of other agents, which is known as theory of mind reasoning. We develop a planning-as-inference framework in which agents perform nested simulation to reason about the behavior of other agents in an online manner. As a concrete application of this framework, we use probabilistic programs to model a high-uncertainty variant of pursuit-evasion games in which an agent must make inferences about the other agents' plans to craft counter-plans. Our probabilistic programs incorporate a variety of complex primitives such as field-of-view calculations and path planners, which enable us to model quasi-realistic scenarios in a computationally tractable manner. We perform extensive experimental evaluations which establish a variety of rational behaviors and quantify how allocating computation across levels of nesting affects the variance of our estimators.

CLAug 14, 2018
Embedding Grammars

David Wingate, William Myers, Nancy Fulda et al.

Classic grammars and regular expressions can be used for a variety of purposes, including parsing, intent detection, and matching. However, the comparisons are performed at a structural level, with constituent elements (words or characters) matched exactly. Recent advances in word embeddings show that semantically related words share common features in a vector-space representation, suggesting the possibility of a hybrid grammar and word embedding. In this paper, we blend the structure of standard context-free grammars with the semantic generalization capabilities of word embeddings to create hybrid semantic grammars. These semantic grammars generalize the specific terminals used by the programmer to other words and phrases with related meanings, allowing the construction of compact grammars that match an entire region of the vector space rather than matching specific elements.

ROMay 30, 2017
Estimating Human Intent for Physical Human-Robot Co-Manipulation

Eric C. Townsend, Erich A Mielke, David Wingate et al.

Human teams can be exceptionally efficient at adapting and collaborating during manipulation tasks using shared mental models. However, the same shared mental models that can be used by humans to perform robust low-level force and motion control during collaborative manipulation tasks are non-existent for robots. For robots to perform collaborative tasks with people naturally and efficiently, understanding and predicting human intent is necessary. However, humans are difficult to predict and model. We have completed an exploratory study recording motion and force for 20 human dyads moving an object in tandem in order to better understand how they move and how their movement can be predicted. In this paper, we show how past motion data can be used to predict human intent. In order to predict human intent, which we equate with the human team's velocity for a short time horizon, we used a neural network. Using the previous 150 time steps at a rate of 200 Hz, human intent can be predicted for the next 50 time steps with a mean squared error of 0.02 (m/s)^2. We also show that human intent can be estimated in a human-robot dyad. This work is an important first step in enabling future work of integrating human intent estimation on a robot controller to execute a short-term collaborative trajectory.

AIApr 17, 2017
Probabilistic programs for inferring the goals of autonomous agents

Marco F. Cusumano-Towner, Alexey Radul, David Wingate et al.

Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. This paper introduces a class of probabilistic programs for formulating and solving these problems. The formulation uses randomized path planning algorithms as the basis for probabilistic models of the process by which autonomous agents plan to achieve their goals. Because these path planning algorithms do not have tractable likelihood functions, new inference algorithms are needed. This paper proposes two Monte Carlo techniques for these "likelihood-free" models, one of which can use likelihood estimates from neural networks to accelerate inference. The paper demonstrates efficacy on three simple examples, each using under 50 lines of probabilistic code.

AIMar 9, 2017
What can you do with a rock? Affordance extraction via word embeddings

Nancy Fulda, Daniel Ricks, Ben Murdoch et al.

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance detection is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance most of the time. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

MLJan 7, 2013
Automated Variational Inference in Probabilistic Programming

David Wingate, Theophane Weber

We present a new algorithm for approximate inference in probabilistic programs, based on a stochastic gradient for variational programs. This method is efficient without restrictions on the probabilistic program; it is particularly practical for distributions which are not analytically tractable, including highly structured distributions that arise in probabilistic programs. We show how to automatically derive mean-field probabilistic programs and optimize them, and demonstrate that our perspective improves inference efficiency over other algorithms.

AIJul 4, 2012
Predictive Linear-Gaussian Models of Stochastic Dynamical Systems

Matthew Rudary, Satinder Singh, David Wingate

Models of dynamical systems based on predictive state representations (PSRs) are defined strictly in terms of observable quantities, in contrast with traditional models (such as Hidden Markov Models) that use latent variables or statespace representations. In addition, PSRs have an effectively infinite memory, allowing them to model some systems that finite memory-based models cannot. Thus far, PSR models have primarily been developed for domains with discrete observations. Here, we develop the Predictive Linear-Gaussian (PLG) model, a class of PSR models for domains with continuous observations. We show that PLG models subsume Linear Dynamical System models (also called Kalman filter models or state-space models) while using fewer parameters. We also introduce an algorithm to estimate PLG parameters from data, and contrast it with standard Expectation Maximization (EM) algorithms used to estimate Kalman filter parameters. We show that our algorithm is a consistent estimation procedure and present preliminary empirical results suggesting that our algorithm outperforms EM, particularly as the model dimension increases.

MLMay 9, 2012
The Infinite Latent Events Model

David Wingate, Noah Goodman, Daniel Roy et al.

We present the Infinite Latent Events Model, a nonparametric hierarchical Bayesian distribution over infinite dimensional Dynamic Bayesian Networks with binary state representations and noisy-OR-like transitions. The distribution can be used to learn structure in discrete timeseries data by simultaneously inferring a set of latent events, which events fired at each timestep, and how those events are causally linked. We illustrate the model on a sound factorization task, a network topology identification task, and a video game task.

LGMay 9, 2012
A Bayesian Sampling Approach to Exploration in Reinforcement Learning

John Asmuth, Lihong Li, Michael L. Littman et al.

We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and how to combine the models. We show that our algorithm achieves nearoptimal reward with high probability with a sample complexity that is low relative to the speed at which the posterior distribution converges during learning. We demonstrate that BOSS performs quite favorably compared to state-of-the-art reinforcement-learning approaches and illustrate its flexibility by pairing it with a non-parametric model that generalizes across states.