LGOCMLOct 1, 2020

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

arXiv:2010.00539v215 citations
AI Analysis

This provides theoretical guarantees for gradient descent in agnostic learning, addressing a bottleneck in robust classification for machine learning applications, though it is incremental as it builds on existing soft margin concepts.

The paper tackles the problem of agnostic learning of linear halfspaces using gradient descent on convex surrogates, showing that it achieves classification error of approximately the square root of the optimal error plus a small term, with polynomial time and sample complexity for distributions like log-concave isotropic ones.

We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If $\mathsf{OPT}$ is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error $\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon$ in $\mathrm{poly}(d,1/\varepsilon)$ time and sample complexity for a broad class of distributions that includes log-concave isotropic distributions as a subclass. Along the way we answer a question recently posed by Ji et al. (2020) on how the tail behavior of a loss function can affect sample complexity and runtime guarantees for gradient descent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes