DSLGOCMLJun 12, 2020

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

arXiv:2006.06980v145 citations
Originality Incremental advance
AI Analysis

This addresses robust covariance estimation for corrupted data, a fundamental problem in statistics and machine learning, with incremental improvements in efficiency and guarantees.

The paper tackles robust PCA for corrupted sub-Gaussian data, developing polynomial-time algorithms that return approximate top eigenvectors with guarantees like 1 - O(ε log ε^{-1}) approximation, and introduces width-independent solvers for Schatten-p norm packing SDPs with (1 + ε)-approximate solutions in efficient iterations.

We develop two methods for the following fundamental statistical task: given an $ε$-corrupted set of $n$ samples from a $d$-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a $1 - O(ε\logε^{-1})$-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-$p$ norm packing semidefinite programs, giving a $(1 + ε)$-approximate solution in $O(p\log(\tfrac{nd}ε)ε^{-1})$ input-sparsity time iterations (where $n$, $d$ are problem dimensions).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes