OCLGFeb 6, 2020

Global Convergence of Frank Wolfe on One Hidden Layer Networks

arXiv:2002.02208v14 citations
AI Analysis

This provides theoretical guarantees for a specific optimization method in neural network training, which is incremental as it applies to a narrow class of networks.

The paper tackles the problem of training one-hidden-layer neural networks with ReLU activation by deriving global convergence bounds for the Frank Wolfe algorithm, showing it converges with a rate of O(1/T) under certain data assumptions.

We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then converges with rate $O(1/T)$ where $T$ is both the number of neurons and the number of calls to the oracle.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes