Daniel G. Goldstein

AI
h-index44
6papers
1,029citations
Novelty48%
AI Score45

6 Papers

GTMar 26
Agentic Markets: Equilibrium Effects of Improving Consumer Search

Brendan Lucier, Nicole Immorlica, Markus Mobius et al.

Motivated by agentic markets -- two-sided markets in which consumers and businesses are assisted by AI tools that facilitate consumers' search -- we study the impact of improved search technology on learning and welfare in markets. We put forth a model where consumers engage in costly search to acquire signals of product fit prior to purchase. The market tracks indications of fit for searched products and indications of quality for chosen products, thereby guiding searches. We characterize the long-run steady-state of the resulting dynamics as well as the impact of improving search technology. We find cheaper search improves learning and consumer surplus, whereas more informative search can degrade both unless the market learns as much as consumers about the products by, for example, ``reading the transcripts'' of agentic conversations. Finally, we consider the impact of search improvements on how businesses set prices. At equilibrium prices in symmetric markets, consumer surplus is improved by cheaper search but may be decreased by more informative search, due to weakened inter-business competition.

HCAug 3, 2023
Comparing scalable strategies for generating numerical perspectives

Hancheng Cao, Sofia Eleni Spatharioti, Daniel G. Goldstein et al.

Numerical perspectives help people understand extreme and unfamiliar numbers (e.g., \$330 billion is about \$1,000 per person in the United States). While research shows perspectives to be helpful, generating them at scale is challenging both because it is difficult to identify what makes some analogies more helpful than others, and because what is most helpful can vary based on the context in which a given number appears. Here we present and compare three policies for large-scale perspective generation: a rule-based approach, a crowdsourced system, and a model that uses Wikipedia data and semantic similarity (via BERT embeddings) to generate context-specific perspectives. We find that the combination of these three approaches dominates any single method, with different approaches excelling in different settings and users displaying heterogeneous preferences across approaches. We conclude by discussing our deployment of perspectives in a widely-used online word processor.

MAOct 27, 2025
Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

Gagan Bansal, Wenyue Hua, Zezhou Huang et al. · microsoft-research

As LLM agents advance, they are increasingly mediating economic decisions, ranging from product discovery to transactions, on behalf of users. Such applications promise benefits but also raise many questions about agent accountability and value for users. Addressing these questions requires understanding how agents behave in realistic market conditions. However, previous research has largely evaluated agents in constrained settings, such as single-task marketplaces (e.g., negotiation) or structured two-agent interactions. Real-world markets are fundamentally different: they require agents to handle diverse economic activities and coordinate within large, dynamic ecosystems where multiple agents with opaque behaviors may engage in open-ended dialogues. To bridge this gap, we investigate two-sided agentic marketplaces where Assistant agents represent consumers and Service agents represent competing businesses. To study these interactions safely, we develop Magentic-Marketplace -- a simulated environment where Assistants and Services can operate. This environment enables us to study key market dynamics: the utility agents achieve, behavioral biases, vulnerability to manipulation, and how search mechanisms shape market outcomes. Our experiments show that frontier models can approach optimal welfare -- but only under ideal search conditions. Performance degrades sharply with scale, and all models exhibit severe first-proposal bias, creating 10-30x advantages for response speed over quality. These findings reveal how behaviors emerge across market conditions, informing the design of fair and efficient agentic marketplaces.

CYMay 9, 2020
How good is good enough for COVID19 apps? The influence of benefits, accuracy, and privacy on willingness to adopt

Gabriel Kaptchuk, Daniel G. Goldstein, Eszter Hargittai et al.

A growing number of contact tracing apps are being developed to complement manual contact tracing. A key question is whether users will be willing to adopt these contact tracing apps. In this work, we survey over 4,500 Americans to evaluate (1) the effect of both accuracy and privacy concerns on reported willingness to install COVID19 contact tracing apps and (2) how different groups of users weight accuracy vs. privacy. Drawing on our findings from these first two research questions, we (3) quantitatively model how the amount of public health benefit (reduction in infection rate), amount of individual benefit (true-positive detection of exposures to COVID), and degree of privacy risk in a hypothetical contact tracing app may influence American's willingness to install. Our work takes a descriptive ethics approach toward offering implications for the development of policy and app designs related to COVID19.

AIFeb 21, 2018
Manipulating and Measuring Model Interpretability

Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman et al.

With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model's predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments (N=3,800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model's predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were less able to detect and correct for the model's sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.

APFeb 15, 2017
Simple rules for complex decisions

Jongbin Jung, Connor Concannon, Ravi Shroff et al.

From doctors diagnosing patients to judges setting bail, experts often base their decisions on experience and intuition rather than on statistical models. While understandable, relying on intuition over models has often been found to result in inferior outcomes. Here we present a new method, select-regress-and-round, for constructing simple rules that perform well for complex decisions. These rules take the form of a weighted checklist, can be applied mentally, and nonetheless rival the performance of modern machine learning algorithms. Our method for creating these rules is itself simple, and can be carried out by practitioners with basic statistics knowledge. We demonstrate this technique with a detailed case study of judicial decisions to release or detain defendants while they await trial. In this application, as in many policy settings, the effects of proposed decision rules cannot be directly observed from historical data: if a rule recommends releasing a defendant that the judge in reality detained, we do not observe what would have happened under the proposed action. We address this key counterfactual estimation problem by drawing on tools from causal inference. We find that simple rules significantly outperform judges and are on par with decisions derived from random forests trained on all available features. Generalizing to 22 varied decision-making domains, we find this basic result replicates. We conclude with an analytical framework that helps explain why these simple decision rules perform as well as they do.