Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility
This work highlights systematic biases in LLM-based survey simulations for computational social science, showing they are better for diagnosing divergences from human judgment than substituting it, which is an incremental but important finding for researchers using LLMs as proxies.
The study tested whether LLM-simulated survey respondents could reproduce human patterns of misinformation belief and sharing using social survey data, finding that while LLMs captured broad distributional tendencies with modest correlation to human responses, they consistently overstated the belief-sharing association and disproportionately weighted attitudinal features while ignoring network characteristics.
Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science, yet their ability to reproduce patterns of susceptibility to misinformation remains unclear. We test whether LLM-simulated survey respondents, prompted with participant profiles drawn from social survey data measuring network, demographic, attitudinal and behavioral features, can reproduce human patterns of misinformation belief and sharing. Using three online surveys as baselines, we evaluate whether LLM outputs match observed response distributions and recover feature-outcome associations present in the original survey data. LLM-generated responses capture broad distributional tendencies and show modest correlation with human responses, but consistently overstate the association between belief and sharing. Linear models fit to simulated responses exhibit substantially higher explained variance and place disproportionate weight on attitudinal and behavioral features, while largely ignoring personal network characteristics, relative to models fit to human responses. Analyses of model-generated reasoning and LLM training data suggest that these distortions reflect systematic biases in how misinformation-related concepts are represented. Our findings suggest that LLM-based survey simulations are better suited for diagnosing systematic divergences from human judgment than for substituting it.