ML DIS-NN LG STFeb 26, 2020

The role of regularization in classification of high-dimensional noisy Gaussian mixture

Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

arXiv:2002.11544v127.4100 citations

Originality Incremental advance

AI Analysis

This work addresses fundamental challenges in high-dimensional statistics and machine learning, offering insights into regularization effects for noisy data classification, though it appears incremental in extending existing theoretical frameworks.

The paper tackles the problem of classifying high-dimensional noisy Gaussian mixtures, providing a rigorous analysis of generalization error for regularized convex classifiers like ridge, hinge, and logistic regression in the high-dimensional limit. It finds that regularization can sometimes achieve Bayes-optimal performance and illustrates phenomena such as the interpolation peak and cluster size effects.

We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and their dimension $d$ go to infinity while their ratio is fixed to $α= n/d$. We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.

View on arXiv PDF

Similar