LGOCOct 13, 2021

The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program

arXiv:2110.06488v113 citations
Originality Highly original
AI Analysis

This provides theoretical insights into why neural networks trained with gradient descent often succeed, addressing a foundational problem in machine learning for researchers and practitioners.

The paper tackles the problem of understanding the convergence of non-convex gradient flows in training two-layer ReLU neural networks by characterizing their implicit bias as convex regularization and proving convergence to global optima under conditions like orthogonal separable data distributions.

We study non-convex subgradient flows for training two-layer ReLU neural networks from a convex geometry and duality perspective. We characterize the implicit bias of unregularized non-convex gradient flow as convex regularization of an equivalent convex model. We then show that the limit points of non-convex subgradient flows can be identified via primal-dual correspondence in this convex optimization problem. Moreover, we derive a sufficient condition on the dual variables which ensures that the stationary points of the non-convex objective are the KKT points of the convex objective, thus proving convergence of non-convex gradient flows to the global optimum. For a class of regular training data distributions such as orthogonal separable data, we show that this sufficient condition holds. Therefore, non-convex gradient flows in fact converge to optimal solutions of a convex optimization problem. We present numerical results verifying the predictions of our theory for non-convex subgradient descent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes