AICYGTLGFeb 27, 2025

LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory

arXiv:2502.20432v316 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the gap in evaluating strategic reasoning mechanisms in LLMs for interactive decision-making, though it is incremental as it builds on existing behavioral game theory concepts.

The study introduced a behavioral game theory framework to evaluate strategic reasoning in 22 large language models, finding that top models like GPT-o3-mini and GPT-o1 performed best, but model scale and Chain-of-Thought prompting did not consistently improve results, and biases were observed in models like GPT-4o and Gemma based on demographic features.

Strategic decision-making involves interactive reasoning where agents adapt their choices in response to others, yet existing evaluations of large language models (LLMs) often emphasize Nash Equilibrium (NE) approximation, overlooking the mechanisms driving their strategic choices. To bridge this gap, we introduce an evaluation framework grounded in behavioral game theory, disentangling reasoning capability from contextual effects. Testing 22 state-of-the-art LLMs, we find that GPT-o3-mini, GPT-o1, and DeepSeek-R1 dominate most games yet also demonstrate that the model scale alone does not determine performance. In terms of prompting enhancement, Chain-of-Thought (CoT) prompting is not universally effective, as it increases strategic reasoning only for models at certain levels while providing limited gains elsewhere. Additionally, we investigate the impact of encoded demographic features on the models, observing that certain assignments impact the decision-making pattern. For instance, GPT-4o shows stronger strategic reasoning with female traits than males, while Gemma assigns higher reasoning levels to heterosexual identities compared to other sexual orientations, indicating inherent biases. These findings underscore the need for ethical standards and contextual alignment to balance improved reasoning with fairness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes