LGCCJul 1, 2024

Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

arXiv:2407.00966v212 citationsh-index: 40
Originality Highly original
AI Analysis

This work addresses the challenge of learning high-dimensional concepts efficiently for machine learning practitioners, offering a novel theoretical framework that yields practical algorithmic improvements.

The paper tackles the problem of learning complex concept classes like intersections of halfspaces by introducing a smoothed-analysis framework that competes with classifiers robust to Gaussian perturbations, enabling efficient learning for functions with low intrinsic dimension and bounded Gaussian surface area. It achieves a runtime of $k^{poly(\frac{\log k}{\epsilon\gamma})}$ for agnostically learning intersections of $k$-halfspaces, improving from exponential in $k$.

In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on $\mathbb{R}^d \times \{\pm 1\}$ -- is to output a hypothesis that is competitive (to within $ε$) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of $k$-halfspaces in time $k^{poly(\frac{\log k}{εγ}) }$ where $γ$ is the margin parameter. Before our work, the best-known runtime was exponential in $k$ (Arriaga and Vempala, 1999).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes