Geoffrey Chinot

ST
5papers
90citations
Novelty58%
AI Score26

5 Papers

STMay 5, 2021
AdaBoost and robust one-bit compressed sensing

Geoffrey Chinot, Felix Kuchelmeister, Matthias Löffler et al.

This paper studies binary classification in robust one-bit compressed sensing with adversarial errors. It is assumed that the model is overparameterized and that the parameter of interest is effectively sparse. AdaBoost is considered, and, through its relation to the max-$\ell_1$-margin-classifier, prediction error bounds are derived. The developed theory is general and allows for heavy-tailed feature distributions, requiring only a weak moment assumption and an anti-concentration condition. Improved convergence rates are shown when the features satisfy a small deviation lower bound. In particular, the results provide an explanation why interpolating adversarial noise can be harmless for classification problems. Simulations illustrate the presented theory.

STDec 1, 2020
On the robustness of minimum norm interpolators and regularized empirical risk minimizers

Geoffrey Chinot, Matthias Löffler, Sara van de Geer

This article develops a general theory for minimum norm interpolating estimators and regularized empirical risk minimizers (RERM) in linear models in the presence of additive, potentially adversarial, errors. In particular, no conditions on the errors are imposed. A quantitative bound for the prediction error is given, relating it to the Rademacher complexity of the covariates, the norm of the minimum norm interpolator of the errors and the size of the subdifferential around the true parameter. The general theory is illustrated for Gaussian features and several norms: The $\ell_1$, $\ell_2$, group Lasso and nuclear norms. In case of sparsity or low-rank inducing norms, minimum norm interpolators and RERM yield a prediction error of the order of the average noise level, provided that the overparameterization is at least a logarithmic factor larger than the number of samples and that, in case of RERM, the regularization parameter is small enough. Lower bounds that show near optimality of the results complement the analysis.

STMar 12, 2020
On the robustness of the minimum $\ell_2$ interpolator

Geoffrey Chinot, Matthieu Lerasle

We analyse the interpolator with minimal $\ell_2$-norm $\hatβ$ in a general high dimensional linear regression framework where $\mathbb Y=\mathbb Xβ^*+ξ$ where $\mathbb X$ is a random $n\times p$ matrix with independent $\mathcal N(0,Σ)$ rows and without assumption on the noise vector $ξ\in \mathbb R^n$. We prove that, with high probability, the prediction loss of this estimator is bounded from above by $(\|β^*\|^2_2r_{cn}(Σ)\vee \|ξ\|^2)/n$, where $r_{k}(Σ)=\sum_{i\geq k}λ_i(Σ)$ are the rests of the sum of eigenvalues of $Σ$. These bounds show a transition in the rates. For high signal to noise ratios, the rates $\|β^*\|^2_2r_{cn}(Σ)/n$ broadly improve the existing ones. For low signal to noise ratio, we also provide lower bound holding with large probability. Under assumptions on the sprectrum of $Σ$, this lower bound is of order $\| ξ\|_2^2/n$, matching the upper bound. Consequently, in the large noise regime, we are able to precisely track the prediction error with large probability. This results give new insight when the interpolation can be harmless in high dimensions.

STOct 24, 2019
ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels

Geoffrey Chinot

We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk Minimizers (RERM) for regression problems with convex and $L$-Lipschitz loss functions. We consider a setting where $|\cO|$ malicious outliers contaminate the labels. In that case, under a local Bernstein condition, we show that the $L_2$-error rate is bounded by $ r_N + AL |\cO|/N$, where $N$ is the total number of observations, $r_N$ is the $L_2$-error rate in the non-contaminated setting and $A$ is a parameter coming from the local Bernstein condition. When $r_N$ is minimax-rate-optimal in a non-contaminated setting, the rate $r_N + AL|\cO|/N$ is also minimax-rate-optimal when $|\cO|$ outliers contaminate the label. The main results of the paper can be used for many non-regularized and regularized procedures under weak assumptions on the noise. We present results for Huber's M-estimators (without penalization or regularized by the $\ell_1$-norm) and for general regularized learning problems in reproducible kernel Hilbert spaces when the noise can be heavy-tailed.

MLMay 23, 2019
Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks. Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out. On the other hand, the performance of gradient descent on classification problems using the logistic loss function has not been well studied, and further investigation of this problem structure is possible. In this work, we demonstrate that the separability assumption using a neural tangent model is more reasonable than the positivity condition of the neural tangent kernel and provide a refined convergence analysis of the gradient descent for two-layer networks with smooth activations. A remarkable point of our result is that our convergence and generalization bounds have much better dependence on the network width in comparison to related studies. Consequently, our theory provides a generalization guarantee for less over-parameterized two-layer networks, while most studies require much higher over-parameterization.