Learning Halfspaces and Neural Networks with Random Initialization
This addresses the challenge of efficiently learning complex models like neural networks in non-convex settings, offering theoretical guarantees for practitioners in machine learning, though it appears incremental as it builds on existing optimization methods with random initialization.
The paper tackles the problem of non-convex empirical risk minimization for learning halfspaces and neural networks, presenting algorithms that achieve arbitrarily small excess risk ε>0 with time complexity polynomial in input dimension and sample size but exponential in (L/ε²)log(L/ε), and for separable data with constant margin γ>0, it provides a polynomial-time algorithm achieving arbitrary generalization error ε>0 with poly(d,1/ε) sample and time complexity.
We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are $L$-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk $ε>0$. The time complexity is polynomial in the input dimension $d$ and the sample size $n$, but exponential in the quantity $(L/ε^2)\log(L/ε)$. These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin $γ>0$, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin $Ω(γ)$. As a consequence, the algorithm achieves arbitrary generalization error $ε>0$ with ${\rm poly}(d,1/ε)$ sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability $η<1/2$.