LGAINEJul 28, 2025

Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact

arXiv:2508.00903v2h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding stable representational structures in language models for AI interpretability, but it is incremental as it builds on existing neuron analysis methods.

The study investigated universal neurons in GPT-2 Small models, finding that these neurons emerge and persist across training, with ablation experiments showing significant impacts on model predictions as measured by cross entropy loss.

We investigate the phenomenon of neuron universality in independently trained GPT-2 Small models, examining these universal neurons-neurons with consistently correlated activations across models-emerge and evolve throughout training. By analyzing five GPT-2 models at five checkpoints, we identify universal neurons through pairwise correlation analysis of activations over a dataset of 5 million tokens. Ablation experiments reveal significant functional impacts of universal neurons on model predictions, measured via cross entropy loss. Additionally, we quantify neuron persistence, demonstrating high stability of universal neurons across training checkpoints, particularly in early and deeper layers. These findings suggest stable and universal representational structures emerge during language model training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes