LG OC MLMar 12, 2024

Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations

arXiv:2403.08121v310.47 citationsh-index: 2Trans. Mach. Learn. Res.

Originality Synthesis-oriented

AI Analysis

This provides theoretical insights into training dynamics for researchers in deep learning theory, but it is incremental as it builds on existing work on homogeneous networks and KKT analysis.

The paper analyzes gradient flow dynamics in deep homogeneous neural networks with small initializations, showing that weights remain small and converge directionally to KKT points of the neural correlation function early in training, and it derives conditions for rank-one KKT points in networks with specific activations.

This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.

View on arXiv PDF

Similar