Embryology of a Language Model
This work provides a holistic lens for studying the developmental principles of complex neural networks, addressing a central problem in deep learning science for researchers interested in model interpretability.
The authors tackled the problem of understanding how language models develop internal computational structures by applying an embryological approach using UMAP on susceptibility matrices to visualize structural development during training. Their visualizations revealed the emergence of a clear 'body plan,' including known features like the induction circuit and a newly discovered 'spacing fin' for counting space tokens.
Understanding how language models develop their internal computational structure is a central problem in the science of deep learning. While susceptibilities, drawn from statistical physics, offer a promising analytical tool, their full potential for visualizing network organization remains untapped. In this work, we introduce an embryological approach, applying UMAP to the susceptibility matrix to visualize the model's structural development over training. Our visualizations reveal the emergence of a clear ``body plan,'' charting the formation of known features like the induction circuit and discovering previously unknown structures, such as a ``spacing fin'' dedicated to counting space tokens. This work demonstrates that susceptibility analysis can move beyond validation to uncover novel mechanisms, providing a powerful, holistic lens for studying the developmental principles of complex neural networks.