GN AI CY HCOct 25, 2024

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina

Yuan Gao, Dokyun Lee, Gordon Burtch, Sina Fazelpour

arXiv:2410.19599v316.147 citationsh-index: 7

Originality Incremental advance

AI Analysis

This highlights a critical issue for social scientists and AI researchers, cautioning against the use of LLMs as human simulations due to unpredictable failures, making it an incremental contribution by challenging a recent trend.

The paper tackles the problem of using large language models (LLMs) as surrogates for humans in social science research by assessing their reasoning depth with the 11-20 money request game, finding that nearly all advanced models fail to replicate human behavior distributions.

Recent studies suggest large language models (LLMs) can exhibit human-like reasoning, aligning with human behavior in economic experiments, surveys, and political discourse. This has led many to propose that LLMs can be used as surrogates or simulations for humans in social science research. However, LLMs differ fundamentally from humans, relying on probabilistic patterns, absent the embodied experiences or survival objectives that shape human cognition. We assess the reasoning depth of LLMs using the 11-20 money request game. Nearly all advanced approaches fail to replicate human behavior distributions across many models. Causes of failure are diverse and unpredictable, relating to input language, roles, and safeguarding. These results advise caution when using LLMs to study human behavior or as surrogates or simulations.

View on arXiv PDF

Similar