The Geometry of Thought: Disclosing the Transformer as a Tropical Polynomial Circuit
This provides a new geometric perspective for chain-of-thought reasoning in AI, though it is incremental as it builds on existing Transformer theory.
The paper proves that the Transformer self-attention mechanism in the high-confidence regime operates as a tropical polynomial circuit, revealing it executes a dynamic programming recurrence akin to a Bellman-Ford path-finding update on token similarities.
We prove that the Transformer self-attention mechanism in the high-confidence regime ($β\to \infty$, where $β$ is an inverse temperature) operates in the tropical semiring (max-plus algebra). In particular, we show that taking the tropical limit of the softmax attention converts it into a tropical matrix product. This reveals that the Transformer's forward pass is effectively executing a dynamic programming recurrence (specifically, a Bellman-Ford path-finding update) on a latent graph defined by token similarities. Our theoretical result provides a new geometric perspective for chain-of-thought reasoning: it emerges from an inherent shortest-path (or longest-path) algorithm being carried out within the network's computation.