Revealing Language Model Trajectories via Kullback-Leibler Divergence
This provides insights into model behavior for researchers, but it is incremental as it builds on an existing method for analysis.
The paper systematically evaluates KL divergence to analyze language model trajectories, finding spiral structures during pretraining and thread-like progressions across layers, with model trajectories in log-likelihood space being more constrained than in weight space.
A recently proposed method enables efficient estimation of the KL divergence between language models, including models with different architectures, by assigning coordinates based on log-likelihood vectors. To better understand the behavior of this metric, we systematically evaluate KL divergence across a wide range of conditions using publicly available language models. Our analysis covers comparisons between pretraining checkpoints, fine-tuned and base models, and layers via the logit lens. We find that trajectories of language models, as measured by KL divergence, exhibit a spiral structure during pretraining and thread-like progressions across layers. Furthermore, we show that, in terms of diffusion exponents, model trajectories in the log-likelihood space are more constrained than those in weight space.