CL AI CYJul 9, 2025

Prompt Perturbations Reveal Human-Like Biases in Large Language Model Survey Responses

Jens Rupprecht, Georg Ahnert, Markus Strohmaier

arXiv:2507.07188v32 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work highlights reliability issues for researchers using LLMs as proxies in social science surveys, showing they replicate human-like biases, which is incremental but important for methodological improvements.

The study tested nine large language models on World Values Survey questions with various prompt perturbations, revealing that all models exhibit recency bias and sensitivity to semantic changes, with larger models showing more robustness but still affected by combined perturbations.

Large Language Models (LLMs) are increasingly used as proxies for human subjects in social science surveys, but their reliability and susceptibility to known human-like response biases, such as central tendency, opinion floating and primacy bias are poorly understood. This work investigates the response robustness of LLMs in normative survey contexts, we test nine LLMs on questions from the World Values Survey (WVS), applying a comprehensive set of ten perturbations to both question phrasing and answer option structure, resulting in over 167,000 simulated survey interviews. In doing so, we not only reveal LLMs' vulnerabilities to perturbations but also show that all tested models exhibit a consistent recency bias, disproportionately favoring the last-presented answer option. While larger models are generally more robust, all models remain sensitive to semantic variations like paraphrasing and to combined perturbations. This underscores the critical importance of prompt design and robustness testing when using LLMs to generate synthetic survey data.

View on arXiv PDF

Similar