GTAIApr 11, 2024

Do Large Language Models Learn Human-Like Strategic Preferences?

arXiv:2404.08710v28 citationsh-index: 5Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)
AI Analysis

This addresses the problem of understanding AI alignment and human-like decision-making in strategic scenarios for researchers in AI and behavioral science, though it is incremental in building on existing empirical studies.

The paper investigates whether large language models (LLMs) develop human-like strategic preferences, finding that models like Solar and Mistral exhibit stable, human-like cooperation preferences in dilemmas such as the prisoner's dilemma and traveler's dilemma, including effects like stake-size and penalty-size.

In this paper, we evaluate whether LLMs learn to make human-like preference judgements in strategic scenarios as compared with known empirical results. Solar and Mistral are shown to exhibit stable value-based preference consistent with humans and exhibit human-like preference for cooperation in the prisoner's dilemma (including stake-size effect) and traveler's dilemma (including penalty-size effect). We establish a relationship between model size, value-based preference, and superficiality. Finally, results here show that models tending to be less brittle have relied on sliding window attention suggesting a potential link. Additionally, we contribute a novel method for constructing preference relations from arbitrary LLMs and support for a hypothesis regarding human behavior in the traveler's dilemma.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes