LGNov 30, 2025

The Spectral Dimension of NTKs is Constant: A Theory of Implicit Regularization, Finite-Width Stability, and Scalable Estimation

arXiv:2512.00860v1h-index: 6
Originality Incremental advance
AI Analysis

This work addresses the theoretical understanding of implicit regularization in deep learning for researchers, providing incremental insights into NTK properties.

The paper tackles the problem of understanding the low intrinsic complexity of overparameterized deep networks by analyzing the effective rank of the Neural Tangent Kernel (NTK) Gram matrix, proving a constant-limit law for infinite-width NTKs and demonstrating finite-width stability with scalable estimation, with experimental results on CIFAR-10 showing effective ranks around 1.0-1.3 and slopes near zero across sample sizes.

Modern deep networks are heavily overparameterized yet often generalize well, suggesting a form of low intrinsic complexity not reflected by parameter counts. We study this complexity at initialization through the effective rank of the Neural Tangent Kernel (NTK) Gram matrix, $r_{\text{eff}}(K) = (\text{tr}(K))^2/\|K\|_F^2$. For i.i.d. data and the infinite-width NTK $k$, we prove a constant-limit law $\lim_{n\to\infty} \mathbb{E}[r_{\text{eff}}(K_n)] = \mathbb{E}[k(x, x)]^2 / \mathbb{E}[k(x, x')^2] =: r_\infty$, with sub-Gaussian concentration. We further establish finite-width stability: if the finite-width NTK deviates in operator norm by $O_p(m^{-1/2})$ (width $m$), then $r_{\text{eff}}$ changes by $O_p(m^{-1/2})$. We design a scalable estimator using random output probes and a CountSketch of parameter Jacobians and prove conditional unbiasedness and consistency with explicit variance bounds. On CIFAR-10 with ResNet-20/56 (widths 16/32) across $n \in \{10^3, 5\times10^3, 10^4, 2.5\times10^4, 5\times10^4\}$, we observe $r_{\text{eff}} \approx 1.0\text{--}1.3$ and slopes $\approx 0$ in $n$, consistent with the theory, and the kernel-moment prediction closely matches fitted constants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes