Christopher R. Lee-Jenkins

h-index1
2papers

2 Papers

LGAug 28, 2025
Manifold Trajectories in Next-Token Prediction: From Replicator Dynamics to Softmax Equilibrium

Christopher R. Lee-Jenkins

Decoding in large language models is often described as scoring tokens and normalizing with softmax. We give a minimal, self-contained account of this step as a constrained variational principle on the probability simplex. The discrete, normalization-respecting ascent is the classical multiplicative-weights (entropic mirror) update; its continuous-time limit is the replicator flow. From these ingredients we prove that, for a fixed context and temperature, the next-token distribution follows a smooth trajectory inside the simplex and converges to the softmax equilibrium. This formalizes the common ``manifold traversal'' intuition at the output-distribution level. The analysis yields precise, practice-facing consequences: temperature acts as an exact rescaling of time along the same trajectory, while top-k and nucleus sampling restrict the flow to a face with identical guarantees. We also outline a controlled account of path-dependent score adjustments and their connection to loop-like, hallucination-style behavior. We make no claims about training dynamics or internal representations; those are deferred to future work.

LGSep 17, 2025
Softmax as a Lagrangian-Legendrian Seam

Christopher R. Lee-Jenkins

This note offers a first bridge from machine learning to modern differential geometry. We show that the logits-to-probabilities step implemented by softmax can be modeled as a geometric interface: two potential-generated, conservative descriptions (from negative entropy and log-sum-exp) meet along a Legendrian "seam" on a contact screen (the probability simplex) inside a simple folded symplectic collar. Bias-shift invariance appears as Reeb flow on the screen, and the Fenchel-Young equality/KL gap provides a computable distance to the seam. We work out the two- and three-class cases to make the picture concrete and outline next steps for ML: compact logit models (projective or spherical), global invariants, and connections to information geometry where on-screen dynamics manifest as replicator flows.