MLSep 21, 2024
Consistency for Large Neural Networks: Regression and ClassificationHaoran Zhan, Yingcun Xia
Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that the test error curve of neural networks decreases monotonically as model size grows and eventually converges to a non-zero constant. This work aims to explain the theoretical mechanism underlying this tail behavior and study the statistical consistency of deep overparameterized neural networks in many different learning tasks including regression and classification. Firstly, we prove that as the number of parameters increases, the approximation error decreases monotonically, while explicit or implicit regularization (e.g., weight decay) keeps the generalization error existing but bounded. Consequently, the overall error curve eventually converges to a constant determined by the bounded generalization error and the optimization error. Secondly, we prove that deep overparameterized neural networks are statistical consistency across multiple learning tasks if regularization technique is used. Our theoretical findings coincide with numerical experiments and provide a perspective for understanding the generalization behavior of overparameterized neural networks.
34.3MLApr 25
Learning Curves and Benign Overfitting of Spectral Algorithms in Large DimensionsWeihao Lu, Qian Lin, Yingcun Xia et al.
Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the large-dimensional setting where the sample size and dimension are of comparable order, i.e., $n \asymp d^γ$ for some $γ>0$. We first consider inner-product kernels on the sphere $\mathbb{S}^{d-1}$ and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions $s \geq 0$, where $s$ measures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever $s$ is positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in $\mathbb{R}^d$ whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions.