MLLGDec 7, 2022

Tight bounds for maximum $\ell_1$-margin classifiers

arXiv:2212.03783v21 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap in understanding the statistical performance of popular iterative algorithms like boosting, revealing limitations in their adaptivity for sparse classification problems.

The paper tackles the problem of whether the maximum ℓ₁-margin classifier adapts to sparse ground truths in high-dimensional linear classification, and finds that it does not, proving tight bounds matching general rates for noiseless settings and showing benign overfitting with noisy data.

Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the $\ell_1$-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum $\ell_1$-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order $\frac{\|w^*\|_1^{2/3}}{n^{1/3}}$ for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order $\frac{1}{\sqrt{\log(d/n)}}$. We are therefore first to show benign overfitting for the maximum $\ell_1$-margin classifier.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes