LGOCMLApr 24, 2021

Achieving Small Test Error in Mildly Overparameterized Neural Networks

arXiv:2104.11895v13 citations
Originality Incremental advance
AI Analysis

This addresses the theoretical challenge of efficient optimization and generalization in neural networks for researchers in machine learning theory, though it is incremental as it builds on prior work on overparameterization.

The paper tackles the problem of finding a point with small test error for binary classification using two-layer mildly overparameterized ReLU neural networks in polynomial time, proving that such points exist in the loss landscape and providing polynomial-time algorithms for convolutional and fully connected networks under certain conditions.

Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization. Many existing works that study optimization and generalization together are based on neural tangent kernel and require a very large width. In this work, we are interested in the following question: for a binary classification problem with two-layer mildly over-parameterized ReLU network, can we find a point with small test error in polynomial time? We first show that the landscape of loss functions with explicit regularization has the following property: all local minima and certain other points which are only stationary in certain directions achieve small test error. We then prove that for convolutional neural nets, there is an algorithm which finds one of these points in polynomial time (in the input dimension and the number of data points). In addition, we prove that for a fully connected neural net, with an additional assumption on the data distribution, there is a polynomial time algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes