Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time
This provides theoretical insights into why local gradient methods work well for neural networks, with exponential improvement over prior results, though it is incremental in advancing optimization theory for deep learning.
The paper tackles the problem of training two-layer ReLU neural networks by analyzing the optimality gap between non-convex networks and their convex relaxations, showing that under random data, the gap is bounded by O(log n^0.5) and leading to a polynomial-time algorithm with logarithmic approximation guarantees.
In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.