Optimal Spectral Recovery of a Planted Vector in a Subspace
This addresses a generic task in machine learning and statistics, such as dictionary learning and PCA, with incremental improvements in conditions and error bounds.
The paper tackles the problem of recovering a planted vector in a random subspace, showing that a spectral method achieves approximate recovery with high probability when nρ ≪ √N, including exact recovery for dense Bernoulli-Rademacher vectors, and proves that detection fails when nρ ≫ √N, suggesting computational hardness.
Recovering a planted vector $v$ in an $n$-dimensional random subspace of $\mathbb{R}^N$ is a generic task related to many problems in machine learning and statistics, such as dictionary learning, subspace recovery, principal component analysis, and non-Gaussian component analysis. In this work, we study computationally efficient estimation and detection of a planted vector $v$ whose $\ell_4$ norm differs from that of a Gaussian vector with the same $\ell_2$ norm. For instance, in the special case where $v$ is an $N ρ$-sparse vector with Bernoulli-Gaussian or Bernoulli-Rademacher entries, our results include the following: (1) We give an improved analysis of a slight variant of the spectral method proposed by Hopkins, Schramm, Shi, and Steurer (2016), showing that it approximately recovers $v$ with high probability in the regime $n ρ\ll \sqrt{N}$. This condition subsumes the conditions $ρ\ll 1/\sqrt{n}$ or $n \sqrtρ \lesssim \sqrt{N}$ required by previous work up to polylogarithmic factors. We achieve $\ell_\infty$ error bounds for the spectral estimator via a leave-one-out analysis, from which it follows that a simple thresholding procedure exactly recovers $v$ with Bernoulli-Rademacher entries, even in the dense case $ρ= 1$. (2) We study the associated detection problem and show that in the regime $n ρ\gg \sqrt{N}$, any spectral method from a large class (and more generally, any low-degree polynomial of the input) fails to detect the planted vector. This matches the condition for recovery and offers evidence that no polynomial-time algorithm can succeed in recovering a Bernoulli-Gaussian vector $v$ when $n ρ\gg \sqrt{N}$.