CL AISep 21, 2025

Evolution of Concepts in Language Model Pre-Training

Xuyang Ge, Wentao Shu, Jiaxing Wu, Yunhua Zhou, Zhengfu He, Xipeng Qiu

arXiv:2509.17196v113.07 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work incrementally advances understanding of the black-box pre-training process for language models by providing fine-grained tracking of representation dynamics.

The researchers tracked interpretable feature evolution during language model pre-training using crosscoders, finding that features form at specific points and complex patterns emerge later, with feature attribution showing causal connections to downstream performance.

Language models obtain extensive capabilities through pre-training. However, the pre-training process remains a black box. In this work, we track linear interpretable feature evolution across pre-training snapshots using a sparse dictionary learning method called crosscoders. We find that most features begin to form around a specific point, while more complex patterns emerge in later training stages. Feature attribution analyses reveal causal connections between feature evolution and downstream performance. Our feature-level observations are highly consistent with previous findings on Transformer's two-stage learning process, which we term a statistical learning phase and a feature learning phase. Our work opens up the possibility to track fine-grained representation progress during language model learning dynamics.

View on arXiv PDF

Similar