Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations
This work provides foundational insights into the implicit geometry of semantics in language models, bridging distributional semantics and neural collapse, which is significant for researchers in NLP and AI interpretability.
The paper tackled the problem of understanding how next-token prediction optimization in language models organizes semantic structure, revealing that models implicitly factor co-occurrence patterns via SVD, with concepts learned earlier corresponding to larger singular values, and validated this on synthetic and pretrained models to recover semantic categories like grammatical types and named entities.
We investigate how next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text. Our analysis, based on a tractable mathematical model and controlled synthetic data, reveals that NTP implicitly guides models to factor a centered support matrix encoding context-to-next-token co-occurrence patterns via singular value decomposition (SVD). While models never explicitly construct this matrix, learned word and context embeddings converge to its SVD factors, with singular vectors encoding latent semantic concepts through their sign patterns. We demonstrate that concepts corresponding to larger singular values are learned earlier during training, yielding a natural semantic hierarchy where broad categories emerge before fine-grained ones. This insight motivates orthant-based clustering, a method that combines concept signs to identify interpretable semantic categories. We validate our findings on synthetic datasets and pretrained language models, recovering diverse semantic structures such as grammatical categories, named entity types, and topical distinctions (medical, entertainment). Our work bridges classical distributional semantics and neural collapse geometry, characterizing how gradient-based optimization implicitly determines both the matrix representation and factorization method that encode semantic structure.