DS LG ST MLMar 4, 2024

Statistical Query Lower Bounds for Learning Truncated Gaussians

Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas, Nikos Zarifis

arXiv:2403.02300v15.16 citationsh-index: 48COLT

Originality Incremental advance

AI Analysis

This reveals an information-computation gap for truncated Gaussian mean estimation, which is incremental as it builds on prior work in SQ lower bounds.

The paper tackles the problem of estimating the mean of a truncated Gaussian distribution, showing a super-polynomial Statistical Query (SQ) lower bound of d^{poly(1/ε)} for algorithms, even when poly(d/ε) samples suffice information-theoretically.

We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution $\mathcal{N}(\boldsymbol{ μ}, \mathbf{ I})$ truncated to the set $S$. The goal is to estimate $\boldsymbolμ$ within accuracy $ε>0$ in $\ell_2$-norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is $d^{\mathrm{poly}(1/ε)}$, even when the class $\mathcal{C}$ is simple so that $\mathrm{poly}(d/ε)$ samples information-theoretically suffice. Concretely, our SQ lower bound applies when $\mathcal{C}$ is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.

View on arXiv PDF

Similar