A geometric relation of the error introduced by sampling a language model's output distribution to its internal state
Provides a geometric framework linking token-level sensitivity to model internals, offering a new lens for understanding language model behavior.
The paper derives a geometric 1-form from token embeddings that captures sensitivity to single-token changes in GPT models, and shows its curvature correlates with semantic structure in chess reasoning tasks, indicating that token space geometry reflects internal problem representations.
GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form that depends only on the geometry of the token embeddings. Despite this purely geometric origin, we show that its curvature is semantically meaningful: On chess reasoning tasks, the curvature couples to the world model of an off-the-shelf instruction-tuned model, with transformations clustering by board region and respecting piece importance. Our findings suggest that token space geometry directly reflects how models internally represent problems.