LG AIOct 1, 2025

Feature Identification via the Empirical NTK

arXiv:2510.00468v22 citationsh-index: 1

Originality Incremental advance

AI Analysis

This provides a practical method for feature discovery and detecting phase changes in small models, which is incremental as it builds on existing NTK theory.

The paper tackles the problem of identifying features learned by neural networks by using eigenanalysis of the empirical neural tangent kernel (eNTK), showing that it recovers ground-truth features in toy models like Toy Models of Superposition and modular addition, with alignment in sparse and dense regimes.

We provide evidence that eigenanalysis of the empirical neural tangent kernel (eNTK) can surface the features used by trained neural networks. Across two standard toy models for mechanistic interpretability, Toy Models of Superposition (TMS) and a 1-layer MLP trained on modular addition, we find that the eNTK exhibits sharp spectral cliffs whose top eigenspaces align with ground-truth features. In TMS, the eNTK recovers the ground-truth features in both the sparse (high superposition) and dense regimes. In modular arithmetic, the eNTK can be used to recover Fourier feature families. Moreover, we provide evidence that a layerwise eNTK localizes features to specific layers and that the evolution of the eNTK spectrum can be used to diagnose the grokking phase transition. These results suggest that eNTK analysis may provide a practical handle for feature discovery and for detecting phase changes in small models.

View on arXiv PDF

Similar