LGAISep 30, 2025

Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management

arXiv:2510.03310v11 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of using LLMs as lower-cost simulators for human behavior in operations management, but it is incremental as it builds on existing evaluation methods.

The paper evaluated how well large language models (LLMs) replicate human behavior in operations management, finding that they reproduce most hypothesis-level effects but their response distributions diverge from human data, with interventions like chain-of-thought prompting reducing misalignment.

LLMs are emerging tools for simulating human behavior in business, economics, and social science, offering a lower-cost complement to laboratory experiments, field studies, and surveys. This paper evaluates how well LLMs replicate human behavior in operations management. Using nine published experiments in behavioral operations, we assess two criteria: replication of hypothesis-test outcomes and distributional alignment via Wasserstein distance. LLMs reproduce most hypothesis-level effects, capturing key decision biases, but their response distributions diverge from human data, including for strong commercial models. We also test two lightweight interventions -- chain-of-thought prompting and hyperparameter tuning -- which reduce misalignment and can sometimes let smaller or open-source models match or surpass larger systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes