LGMLFeb 25, 2020

Convex Geometry and Duality of Over-parameterized Neural Networks

arXiv:2002.11219v470 citations
AI Analysis

This provides a novel theoretical framework for understanding and optimizing over-parameterized neural networks, addressing limitations of existing methods like the Neural Tangent Kernel, which is incremental in offering new convex geometric insights.

The authors tackled the problem of analyzing finite-width two-layer ReLU networks by developing a convex analytic approach, proving that optimal solutions can be characterized as extreme points of a convex set and showing that this leads to linear spline interpolation for certain regression problems and interpretable decision regions for classification.

We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We first prove that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set, where simple solutions are encouraged via its convex geometrical properties. We then leverage this characterization to show that an optimal set of parameters yield linear spline interpolation for regression problems involving one dimensional or rank-one data. We also characterize the classification decision regions in terms of a kernel matrix and minimum $\ell_1$-norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain predictions of finite width networks. Our convex geometric characterization also provides intuitive explanations of hidden neurons as auto-encoders. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints. Then, we apply certain convex relaxations and introduce a cutting-plane algorithm to globally optimize the network. We further analyze the exactness of the relaxations to provide conditions for the convergence to a global optimum. Our analysis also shows that optimal network parameters can be also characterized as interpretable closed-form formulas in some practically relevant special cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes