The asymptotic spectrum of the Hessian of DNN throughout training
This work addresses theoretical challenges in analyzing DNN optimization dynamics, offering incremental insights into Hessian behavior for researchers in machine learning theory.
The authors tackled the problem of understanding the Hessian spectrum of deep neural networks (DNNs) during training by leveraging the Neural Tangent Kernel (NTK). They characterized the full asymptotic spectrum when the NTK is fixed and described the first two moments in the mean-field limit, providing precise insights into Hessian dynamics.
The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs. When the NTK is fixed during training, we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. In the so-called mean-field limit, where the NTK is not fixed during training, we describe the first two moments of the Hessian at initialization.