PCA in Data-Dependent Noise (Correlated-PCA): Nearly Optimal Finite Sample Guarantees
This work addresses a fundamental challenge in statistical estimation for machine learning and data analysis, offering enhanced robustness in PCA applications where noise is correlated with data, though it is incremental relative to existing research on correlated-PCA.
The paper tackles the problem of Principal Component Analysis (PCA) in the presence of data-dependent noise, where noise and true data are correlated, by providing nearly optimal finite sample guarantees for the singular value decomposition (SVD) solution. It achieves a significant improvement in sample complexity bounds over prior work, under weaker assumptions on data-noise correlation.
We study Principal Component Analysis (PCA) in a setting where a part of the corrupting noise is data-dependent and, as a result, the noise and the true data are correlated. Under a bounded-ness assumption on the true data and the noise, and a simple assumption on data-noise correlation, we obtain a nearly optimal sample complexity bound for the most commonly used PCA solution, singular value decomposition (SVD). This bound is a significant improvement over the bound obtained by Vaswani and Guo in recent work (NIPS 2016) where this "correlated-PCA" problem was first studied; and it holds under a significantly weaker data-noise correlation assumption than the one used for this earlier result.