LG AI CLSep 27, 2025

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Melody Zixuan Li, Kumar Krishna Agrawal, Arna Ghosh, Komal Kumar Teru, Adam Santoro, Guillaume Lajoie, Blake A. Richards

arXiv:2509.23024v130.331 citationsh-index: 7

Originality Highly original

AI Analysis

This work provides geometric explanations for capability emergence in language models, which could help researchers understand and optimize training dynamics.

The researchers investigated how the geometry of language model representations evolves during pretraining and post-training, discovering a consistent three-phase sequence of geometric transformations that correlate with performance changes. They found pretraining involves collapse, expansion, and compression phases, while different post-training methods produce distinct geometric effects: SFT/DPO improve in-distribution performance but reduce robustness, while RLVR enhances reward alignment but decreases diversity.

Standard training metrics like loss fail to explain the emergence of complex capabilities in large language models. We take a spectral approach to investigate the geometry of learned representations across pretraining and post-training, measuring effective rank (RankMe) and eigenspectrum decay ($α$-ReQ). With OLMo (1B-7B) and Pythia (160M-12B) models, we uncover a consistent non-monotonic sequence of three geometric phases during autoregressive pretraining. The initial "warmup" phase exhibits rapid representational collapse. This is followed by an "entropy-seeking" phase, where the manifold's dimensionality expands substantially, coinciding with peak n-gram memorization. Subsequently, a "compression-seeking" phase imposes anisotropic consolidation, selectively preserving variance along dominant eigendirections while contracting others, a transition marked with significant improvement in downstream task performance. We show these phases can emerge from a fundamental interplay of cross-entropy optimization under skewed token frequencies and representational bottlenecks ($d \ll |V|$). Post-training further transforms geometry: SFT and DPO drive "entropy-seeking" dynamics to integrate specific instructional or preferential data, improving in-distribution performance while degrading out-of-distribution robustness. Conversely, RLVR induces "compression-seeking", enhancing reward alignment but reducing generation diversity.

View on arXiv PDF

Similar