Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise
This work addresses the correlated-PCA problem, offering improved theoretical guarantees for scenarios like dynamic robust PCA, but it is incremental as it builds on earlier research.
The paper tackles the problem of Principal Component Analysis (PCA) with non-isotropic and data-dependent noise, which can correlate noise with true data, by providing novel finite sample guarantees that improve upon prior work. The results show near-optimal sample complexity for subspace recovery error in certain regimes, with applications to sparse noise and missing data.
This work obtains novel finite sample guarantees for Principal Component Analysis (PCA). These hold even when the corrupting noise is non-isotropic, and a part (or all of it) is data-dependent. Because of the latter, in general, the noise and the true data are correlated. The results in this work are a significant improvement over those given in our earlier work where this "correlated-PCA" problem was first studied. In fact, in certain regimes, our results imply that the sample complexity required to achieve subspace recovery error that is a constant fraction of the noise level is near-optimal. Useful corollaries of our result include guarantees for PCA in sparse data-dependent noise and for PCA with missing data. An important application of the former is in proving correctness of the subspace update step of a popular online algorithm for dynamic robust PCA.