Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks
This provides theoretical insights into training dynamics for neural networks with small initializations, which is incremental for understanding optimization behavior in machine learning.
The paper analyzes gradient flow dynamics in two-homogeneous neural networks with small initializations near the origin, showing that weights approximately converge in direction to KKT points of a neural correlation function for square and logistic losses, and it extends this to directional convergence near certain saddle points.
This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.