Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods
This addresses a long-standing bottleneck in data analysis for applications where non-Gaussian subspaces are relevant, though it is incremental as it builds on prior NGCA work.
The paper tackles the problem of Non-Gaussian Component Analysis (NGCA) by proposing a new characterization of Gaussian distributions and a Reweighted PCA algorithm, proving it recovers at least one direction in the subspace with polynomial time and sample complexity in the ambient dimension.
The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-gaussian distribution. Although this is an appropriate model for some real world data analysis problems, there has been little progress on this problem over the last decade. In this paper, we attempt to address this state of affairs in two ways. First, we give a new characterization of standard gaussian distributions in high-dimensions, which lead to effective tests for non-gaussianness. Second, we propose a simple algorithm, \emph{Reweighted PCA}, as a method for solving the NGCA problem. We prove that for a general unknown non-gaussian distribution, this algorithm recovers at least one direction in $E$, with sample and time complexity depending polynomially on the dimension of the ambient space. We conjecture that the algorithm actually recovers the entire $E$.