MLLGSep 5, 2024

Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model

arXiv:2409.03703v1h-index: 20
Originality Incremental advance
AI Analysis

This addresses robust learning in adversarial settings for machine learning practitioners, offering incremental improvements in bounds and efficiency over existing methods.

The paper tackles the problem of learning single neuron models and linear regression under adversarial corruption of labels and covariates, deriving approximation bounds of O(ν√(ε log(1/ε))) for non-linear activations and O(νε log(1/ε)) for linear regression, with improved runtime complexity compared to prior work.

We derive approximation bounds for learning single neuron models using thresholded gradient descent when both the labels and the covariates are possibly corrupted adversarially. We assume the data follows the model $y = σ(\mathbf{w}^{*} \cdot \mathbf{x}) + ξ,$ where $σ$ is a nonlinear activation function, the noise $ξ$ is Gaussian, and the covariate vector $\mathbf{x}$ is sampled from a sub-Gaussian distribution. We study sigmoidal, leaky-ReLU, and ReLU activation functions and derive a $O(ν\sqrt{ε\log(1/ε)})$ approximation bound in $\ell_{2}$-norm, with sample complexity $O(d/ε)$ and failure probability $e^{-Ω(d)}$. We also study the linear regression problem, where $σ(\mathbf{x}) = \mathbf{x}$. We derive a $O(νε\log(1/ε))$ approximation bound, improving upon the previous $O(ν)$ approximation bounds for the gradient-descent based iterative thresholding algorithms of Bhatia et al. (NeurIPS 2015) and Shen and Sanghavi (ICML 2019). Our algorithm has a $O(\textrm{polylog}(N,d)\log(R/ε))$ runtime complexity when $\|\mathbf{w}^{*}\|_2 \leq R$, improving upon the $O(\text{polylog}(N,d)/ε^2)$ runtime complexity of Awasthi et al. (NeurIPS 2022).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes