AIMay 19, 2025

Emergent Specialization: Rare Token Neurons in Language Models

arXiv:2505.12822v22 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of rare token representation in language models for specialized domains, but it is incremental as it builds on existing understanding of neuron specialization.

The study identified rare token neurons in language models that strongly influence predictions of rare tokens, showing they emerge dynamically during training with a three-phase organization and form a coordinated subnetwork. This specialization correlates with heavy-tailed weight distributions, suggesting a statistical mechanical basis.

Large language models struggle with representing and generating rare tokens despite their importance in specialized domains. In this study, we identify neuron structures with exceptionally strong influence on language model's prediction of rare tokens, termed as rare token neurons, and investigate the mechanism for their emergence and behavior. These neurons exhibit a characteristic three-phase organization (plateau, power-law, and rapid decay) that emerges dynamically during training, evolving from a homogeneous initial state to a functionally differentiated architecture. In the activation space, rare token neurons form a coordinated subnetwork that selectively co-activates while avoiding co-activation with other neurons. This functional specialization potentially correlates with the development of heavy-tailed weight distributions, suggesting a statistical mechanical basis for emergent specialization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes