CYAICLApr 24

From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement Attitudes

arXiv:2605.1630388.8
AI Analysis

For researchers using LLMs to simulate human survey responses, this work highlights the limitations of demographics-only agents and the need for richer contextual data.

The paper evaluates LLM agents defined solely by demographics versus those augmented with survey responses for predicting human survey answers in the SHARE dataset. Demographic-only agents show central tendency bias and fail to reproduce human-like errors or interactions among factors, while survey-anchored agents better replicate real response patterns.

Large language models (LLM) agents may offer tools to predict human responses to surveys. A common technique for defining these agents uses only demographics, for example country, age, gender, employment status, income, education and marital status. We compare the predictive accuracy of demographic agents to that of survey agents defined with a larger set of in-domain survey responses. We test both approaches in predicting responses to the multidisciplinary, cross-national Survey of Health, Ageing and Retirement in Europe (SHARE), focusing on five variables from three policy-relevant constructs around personal finance. In these three constructs, we observe that, compared to survey agents trained on broader data, demographics-only agents (1) exhibited a central tendency bias, skewing answers toward population means, and (2) were unrealistically accurate, failing to reproduce the incorrect answers and "don't know" responses typical of human respondents. These performance differences are further substantiated through the replication of a hierarchical regression analysis from prior retirement planning research. Agents based solely on demographic information reproduce the outcome that financial risk tolerance, future time perspective, and knowledge of retirement planning each are predictive of retirement savings. However, only the survey-anchored agents succeed in reproducing the interaction among these three factors. These findings suggest caution in using only demographics to define LLM agents for predicting survey responses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes