LGCCSTMLJul 28, 2022

Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems

arXiv:2207.14030v223 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the fundamental challenge of learning halfspaces for machine learning theorists, providing the first hardness results based on worst-case assumptions rather than average-case or statistical query models.

The paper tackles the problem of agnostically learning halfspaces and polynomial threshold functions by showing hardness results based on worst-case lattice problems, proving that no efficient algorithm can achieve misclassification error better than 1/2 - γ even when optimal error is small, with specific bounds like δ as small as exp(-Ω(log^{1-c}(d))).

We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, such as GapSVP or SIVP, are hard. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - γ$ even if the optimal misclassification error is as small is as small as $δ$. Here, $γ$ can be smaller than the inverse of any polynomial in the dimension and $δ$ as small as $exp(-Ω(\log^{1-c}(d)))$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension. For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any $β> 0$ learning halfspaces up to error $OPT_{LTF} + ε$ takes time at least $d^{\tildeΩ(1/ε^{2-β})}$ under the same hardness assumptions. Similarly, we show that learning degree-$\ell$ polynomial threshold functions up to error $OPT_{{PTF}_\ell} + ε$ takes time at least $d^{\tildeΩ(\ell^{2-β}/ε^{2-β})}$. $OPT_{LTF}$ and $OPT_{{PTF}_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively. Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions. Previously, such hardness results [Daniely16, DKPZ21] were based on average-case complexity assumptions or restricted to the statistical query model. Our work gives the first hardness results basing these fundamental learning problems on worst-case complexity assumptions. It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes