LGOCMLMar 3, 2023

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Berkeley
arXiv:2303.02255v29 citationsh-index: 96
AI Analysis

This work addresses the challenge of efficient learning in overparameterized neural networks for researchers and practitioners, offering theoretical insights into algorithm choice, but it is incremental as it builds on existing methods like GLM-tron.

This paper tackles the problem of learning a single ReLU neuron in high-dimensional, overparameterized settings by analyzing the GLM-tron algorithm, providing dimension-free risk upper bounds and matching lower bounds that characterize its performance sharply, while showing that SGD performs no better and can suffer constant risk in some cases.

This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable to SGD for high-dimensional ReLU regression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes