DSLGSTMLOct 28, 2024

SoS Certifiability of Subgaussian Distributions and its Algorithmic Applications

arXiv:2410.21194v113 citationsh-index: 48STOC
Originality Highly original
AI Analysis

This work addresses the problem of efficient algorithm design for high-dimensional statistics, offering broad algorithmic applications with near-optimal guarantees, though it is incremental in building on existing SoS and subgaussian theory.

The paper proves that every centered subgaussian distribution is SoS-certifiably subgaussian, enabling efficient learning algorithms for high-dimensional statistical tasks. As a result, it provides computationally efficient algorithms with near-optimal guarantees for tasks like robust mean estimation, clustering, and robust linear regression from subgaussian samples.

We prove that there is a universal constant $C>0$ so that for every $d \in \mathbb N$, every centered subgaussian distribution $\mathcal D$ on $\mathbb R^d$, and every even $p \in \mathbb N$, the $d$-variate polynomial $(Cp)^{p/2} \cdot \|v\|_{2}^p - \mathbb E_{X \sim \mathcal D} \langle v,X\rangle^p$ is a sum of square polynomials. This establishes that every subgaussian distribution is \emph{SoS-certifiably subgaussian} -- a condition that yields efficient learning algorithms for a wide variety of high-dimensional statistical tasks. As a direct corollary, we obtain computationally efficient algorithms with near-optimal guarantees for the following tasks, when given samples from an arbitrary subgaussian distribution: robust mean estimation, list-decodable mean estimation, clustering mean-separated mixture models, robust covariance-aware mean estimation, robust covariance estimation, and robust linear regression. Our proof makes essential use of Talagrand's generic chaining/majorizing measures theorem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes