Curvature-Aligned Probing for Local Loss-Landscape Stabilization
This work addresses the computational bottleneck of probing high-dimensional neural loss landscapes by introducing a principled, scalable method that maintains accuracy, which is valuable for understanding generalization in deep learning.
The authors propose a curvature-aligned probing criterion for measuring local loss-landscape stabilization in neural networks, which focuses on the top-D eigenspace of the empirical Hessian. They prove it preserves the O(k^{-2}) rate of full-space criteria while reducing dimensionality, and demonstrate on a transformer that it reproduces full-space signals to numerical noise with orders-of-magnitude speedup.
Local loss-landscape stabilization under sample growth is typically measured either pointwise or through isotropic averaging in the full parameter space. Despite practical value, both choices probe directions that contribute little to the dominant local deformation of strongly anisotropic neural landscapes. We recast stabilization as an observational problem and introduce a unified family of criteria parameterized by an aggregation order and a probing distribution; within this family we propose a curvature-aligned criterion $Δ_2^{(D)}$ that probes the loss increment field in the top-$D$ eigenspace of the empirical Hessian near a trained solution. Solely from a local quadratic model, we prove that $Δ_2^{(D)}$ preserves the $O(k^{-2})$ mean-squared rate of the full-space criterion while replacing ambient-dimension curvature dependence with dependence on the subspace dimension $D$; a corollary gives a closed-form spectral expression and a proposition identifies the top-$D$ eigenspace as extremal within the eigenspace-aligned family. We also derive scalable estimators based on Hessian-vector products, subspace Monte Carlo, and a closed-form Gaussian-moment proxy. On a decoder-only transformer, a curvature-aligned probe occupying a tiny fraction of parameter space already reproduces the full-space mean-squared signal to within numerical noise throughout the validated local regime, and the closed-form estimator is orders of magnitude faster than direct Monte Carlo after subspace construction.