NCAIMay 7

Features have life history. And we should care

arXiv:2605.1878938.9
Predicted impact top 35% in NC · last 90 daysOriginality Incremental advance
AI Analysis

For researchers studying neural network interpretability and training dynamics, this work provides evidence that feature life history matters and that a small set of early-emerging features forms a backbone for later representation, but the findings are limited to two small models and may not generalize.

This paper identifies a 'carrier scaffold' of ~50 sparse features with stable life histories in language models (Pythia-160M and -410M) that emerges early in training and organizes representational structure. The scaffold is load-bearing, predictable from early firing patterns, and recruits 64% of active features by the end of training, suggesting a two-phase training process where selection occurs in the first 1% and calibration in the remaining 99%.

Features in language models have life history: they emerge, persist, and die during training, yet the importance of that history remains largely unexplored. We find evidence of a persistent representational backbone, which we identify in Pythia-160M and -410M as the carrier scaffold: ${\sim}50$ sparse features with stable life histories, around which the model's representational structure organises. It has four properties. \emph{(i)}~\emph{It assembles early:} features emerge, die, and reorganise ${\sim}40\!\times$ faster in the first $1\%$ of training than afterwards, and the scaffold is already largely fixed by then. \emph{(ii)}~\emph{It is load-bearing:} joint cross-layer ablation identifies the carriers as far more load-bearing than any count-matched non-scaffold population, a gap invisible to per-firing single-feature methods. \emph{(iii)}~\emph{Function precedes direction:} which features will become carriers is already predictable from training-onset firing patterns alone, correctly distinguishing future carriers from non-carriers in $4$ of $5$ cases, before the geometry has settled. \emph{(iv)}~\emph{It seeds subsequent development:} by the end of training, scaffold carriers have recruited $64\%$ of all active features into the scaffold hierarchy. Life history is consistent with a two-phase account of training: selection appears to largely determine the scaffold in the first $1\%$; the remaining $99\%$ appears to calibrate geometry around a substrate already set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes