Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation
This work addresses the need for scalable and cost-effective survey tools for sociological research and policy evaluation, though it is incremental in applying existing LLMs to a new simulation task.
This paper tackles the problem of costly and limited traditional surveys by simulating virtual respondents using Large Language Models (LLMs), introducing settings like Partial Attribute Simulation (PAS) and Full Attribute Simulation (FAS) to evaluate response accuracy, and finds consistent performance trends across multiple LLMs on a benchmark suite spanning 11 real-world datasets.
Questionnaire-based surveys are foundational to social science research and public policymaking, yet traditional survey methods remain costly, time-consuming, and often limited in scale. This paper explores a new paradigm: simulating virtual survey respondents using Large Language Models (LLMs). We introduce two novel simulation settings, namely Partial Attribute Simulation (PAS) and Full Attribute Simulation (FAS), to systematically evaluate the ability of LLMs to generate accurate and demographically coherent responses. In PAS, the model predicts missing attributes based on partial respondent profiles, whereas FAS involves generating complete synthetic datasets under both zero-context and context-enhanced conditions. We curate a comprehensive benchmark suite, LLM-S^3 (Large Language Model-based Sociodemographic Survey Simulation), that spans 11 real-world public datasets across four sociological domains. Our evaluation of multiple mainstream LLMs (GPT-3.5/4 Turbo, LLaMA 3.0/3.1-8B) reveals consistent trends in prediction performance, highlights failure modes, and demonstrates how context and prompt design impact simulation fidelity. This work establishes a rigorous foundation for LLM-driven survey simulations, offering scalable and cost-effective tools for sociological research and policy evaluation. Our code and dataset are available at: https://github.com/dart-lab-research/LLM-S-Cube-Benchmark