The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study
For researchers using LLMs as human simulators in causal inference, this paper identifies and provides diagnostics for a critical confounding bias, though the mitigation approach is incremental.
LLM-simulated experiments suffer from 'user drift' where interventions shift latent user attributes, biasing effect estimates. The authors formalize this bias, propose negative control outcomes to detect it, and show that adjusting persona specifications with relevant confounders reduces bias in survey and multi-turn evaluations.
Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.