CLDec 18, 2025

Emergent World Beliefs: Exploring Transformers in Stochastic Games

arXiv:2512.23722v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding emergent reasoning in AI for researchers, showing incremental progress by extending prior findings from perfect to incomplete information games.

The researchers investigated whether transformer-based language models can develop emergent world models in domains of incomplete information, using poker as a case study, and found that the model learned both deterministic and stochastic features, such as hand ranks and equity, without explicit instruction.

Transformer-based large language models (LLMs) have demonstrated strong reasoning abilities across diverse fields, from solving programming challenges to competing in strategy-intensive games such as chess. Prior work has shown that LLMs can develop emergent world models in games of perfect information, where internal representations correspond to latent states of the environment. In this paper, we extend this line of investigation to domains of incomplete information, focusing on poker as a canonical partially observable Markov decision process (POMDP). We pretrain a GPT-style model on Poker Hand History (PHH) data and probe its internal activations. Our results demonstrate that the model learns both deterministic structure, such as hand ranks, and stochastic features, such as equity, without explicit instruction. Furthermore, by using primarily nonlinear probes, we demonstrated that these representations are decodeable and correlate with theoretical belief states, suggesting that LLMs are learning their own representation of the stochastic environment of Texas Hold'em Poker.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes