LGMay 28

Representation Collapse in Sequential Post-Training of Large Language Models

arXiv:2605.3052449.4h-index: 18
Predicted impact top 51% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This research addresses a potential problem of reduced model performance and adaptability for developers and researchers who sequentially post-train large language models.

This paper investigates whether sequential post-training of large language models leads to the compression of internal representations into low-rank, anisotropic, and homogeneous feature spaces. The authors hypothesize that this representation collapse predicts reduced plasticity, weaker out-of-domain generalization, and poorer calibration in later adaptation stages.

Large language models are now adapted through chains of post-training stages rather than through a single instruction-tuning pass. This paper studies whether such sequential post-training gradually compresses internal representations into low-rank, anisotropic, and homogeneous feature spaces. We define a measurement suite for hidden states, logits, token trajectories, and LoRA updates, and we use it to analyze supervised fine-tuning, preference optimization, safety/refusal tuning, math and code specialization, and long chain-of-thought tuning under controlled stage orderings. The central hypothesis is that excessive representation concentration is not merely a geometric curiosity: it predicts reduced plasticity during later adaptation, weaker out-of-domain generalization, and poorer calibration. We further evaluate lightweight interventions, including mixed-domain replay, feature refresh, representation diversity regularization, and LoRA update decorrelation, as ways to preserve future learnability without giving up the behavioral gains of post-training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes