CLOct 13, 2025

Valid Survey Simulations with Limited Human Data: The Roles of Prompting, Fine-Tuning, and Rectification

Stefan Krsteski, Giuseppe Russo, Serina Chang, Robert West, Kristina Gligorić

arXiv:2510.11408v213.98 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the cost and speed issues in survey execution for researchers and practitioners, offering an incremental improvement in estimation methods.

The paper tackled the problem of using large language models (LLMs) as scalable substitutes for human survey respondents, which often produce biased estimates, and found that combining synthesis methods with rectification reduces bias below 5% and increases effective sample size by up to 14%.

Surveys provide valuable insights into public opinion and behavior, but their execution is costly and slow. Large language models (LLMs) have been proposed as a scalable, low-cost substitute for human respondents, but their outputs are often biased and yield invalid estimates. We study the interplay between synthesis methods that use LLMs to generate survey responses and rectification methods that debias population estimates, and explore how human responses are best allocated between them. Using two panel surveys with questions on nutrition, politics, and economics, we find that synthesis alone introduces substantial bias (24-86%), whereas combining it with rectification reduces bias below 5% and increases effective sample size by up to 14%. Overall, we challenge the common practice of using all human responses for fine-tuning, showing that under a fixed budget, allocating most to rectification results in far more effective estimation.

View on arXiv PDF

Similar