STMLJun 16, 2013

Do semidefinite relaxations solve sparse PCA up to the information limit?

arXiv:1306.3690v495 citations
AI Analysis

This addresses a key theoretical gap in sparse PCA for statisticians and machine learning researchers, but it is incremental as it builds on prior work to refine computational limits.

The paper tackles the problem of recovering sparse principal components in high-dimensional statistics, specifically in a single-spike model, and shows that standard semidefinite programming (SDP) methods fail for sparsity levels k ≥ Ω(√n), while empirical results suggest recovery is possible up to k = O(√n) with a simple covariance thresholding algorithm.

Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an $\ell_0$-sparse eigenvector, in the asymptotic regime as dimension $p$ and sample size $n$ both tend to infinity. Amini and Wainwright [Ann. Statist. 37 (2009) 2877-2921] proved that for sparsity levels $k\geqΩ(n/\log p)$, no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for $k\leq O(\sqrt{n/\log p})$, diagonal thresholding is consistent. It was further conjectured that an SDP approach may close this gap between computational and information limits. We prove that when $k\geqΩ(\sqrt{n})$, the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of $\ell_0$-sparsity $k\geqΩ(\sqrt{n})$. Finally, we present empirical results suggesting that up to sparsity levels $k=O(\sqrt{n})$, recovery is possible by a simple covariance thresholding algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes