CLMay 21, 2025

Revealing Language Model Trajectories via Kullback-Leibler Divergence

arXiv:2505.15353v12 citationsh-index: 6IEEE Internet of Things Journal
Originality Synthesis-oriented
AI Analysis

This provides insights into model behavior for researchers, but it is incremental as it builds on an existing method for analysis.

The paper systematically evaluates KL divergence to analyze language model trajectories, finding spiral structures during pretraining and thread-like progressions across layers, with model trajectories in log-likelihood space being more constrained than in weight space.

A recently proposed method enables efficient estimation of the KL divergence between language models, including models with different architectures, by assigning coordinates based on log-likelihood vectors. To better understand the behavior of this metric, we systematically evaluate KL divergence across a wide range of conditions using publicly available language models. Our analysis covers comparisons between pretraining checkpoints, fine-tuned and base models, and layers via the logit lens. We find that trajectories of language models, as measured by KL divergence, exhibit a spiral structure during pretraining and thread-like progressions across layers. Furthermore, we show that, in terms of diffusion exponents, model trajectories in the log-likelihood space are more constrained than those in weight space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes