CL AI CY SIMar 18, 2025

LLM Generated Persona is a Promise with a Catch

Ang Li, Haozhe Chen, Hongseok Namkoong, Tianyi Peng

arXiv:2503.16527v131.578 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses a critical issue for researchers and practitioners in social science, economics, and marketing who rely on synthetic personas for scalable simulations, highlighting the need for methodological rigor to prevent misleading results.

The paper tackles the problem of systematic biases in LLM-generated personas used for simulations, revealing through large-scale experiments like presidential election forecasts and opinion surveys that these biases cause significant deviations from real-world outcomes.

The use of large language models (LLMs) to simulate human behavior has gained significant attention, particularly through personas that approximate individual characteristics. Persona-based simulations hold promise for transforming disciplines that rely on population-level feedback, including social science, economic analysis, marketing research, and business operations. Traditional methods to collect realistic persona data face significant challenges. They are prohibitively expensive and logistically challenging due to privacy constraints, and often fail to capture multi-dimensional attributes, particularly subjective qualities. Consequently, synthetic persona generation with LLMs offers a scalable, cost-effective alternative. However, current approaches rely on ad hoc and heuristic generation techniques that do not guarantee methodological rigor or simulation precision, resulting in systematic biases in downstream tasks. Through extensive large-scale experiments including presidential election forecasts and general opinion surveys of the U.S. population, we reveal that these biases can lead to significant deviations from real-world outcomes. Our findings underscore the need to develop a rigorous science of persona generation and outline the methodological innovations, organizational and institutional support, and empirical foundations required to enhance the reliability and scalability of LLM-driven persona simulations. To support further research and development in this area, we have open-sourced approximately one million generated personas, available for public access and analysis at https://huggingface.co/datasets/Tianyi-Lab/Personas.

View on arXiv PDF

Similar