LGOCMLFeb 28, 2020

Global Convergence and Geometric Characterization of Slow to Fast Weight Evolution in Neural Network Training for Classifying Linearly Non-Separable Data

arXiv:2002.12563v32 citations
AI Analysis

This provides theoretical guarantees for neural network training on complex data, though it is incremental as it extends existing analyses to non-separable cases.

The paper tackles the problem of gradient descent dynamics in neural networks for classifying linearly non-separable data, showing that with sufficient neurons, all critical points are global minima with perfect classification and gradient descent converges globally, and it identifies a geometric condition for a slow-to-fast weight evolution transition.

In this paper, we study the dynamics of gradient descent in learning neural networks for classification problems. Unlike in existing works, we consider the linearly non-separable case where the training data of different classes lie in orthogonal subspaces. We show that when the network has sufficient (but not exceedingly large) number of neurons, (1) the corresponding minimization problem has a desirable landscape where all critical points are global minima with perfect classification; (2) gradient descent is guaranteed to converge to the global minima. Moreover, we discovered a geometric condition on the network weights so that when it is satisfied, the weight evolution transitions from a slow phase of weight direction spreading to a fast phase of weight convergence. The geometric condition says that the convex hull of the weights projected on the unit sphere contains the origin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes