Spectral Guarantees for Adversarial Streaming PCA
This addresses the challenge of efficient streaming PCA under adversarial conditions for data analysis and machine learning, providing new theoretical guarantees and lower bounds.
The paper tackles the problem of estimating the top eigenvector in streaming PCA with adversarial data, showing that a spectral ratio R = O(log n log d) suffices for an insertion-only algorithm to achieve o(1) error, while proving lower bounds that R = Ω(√d) is necessary for mergeable summaries and no o(d^2) space algorithm works for R = O(1).
In streaming PCA, we see a stream of vectors $x_1, \dotsc, x_n \in \mathbb{R}^d$ and want to estimate the top eigenvector of their covariance matrix. This is easier if the spectral ratio $R = λ_1 / λ_2$ is large. We ask: how large does $R$ need to be to solve streaming PCA in $\widetilde{O}(d)$ space? Existing algorithms require $R = \widetildeΩ(d)$. We show: (1) For all mergeable summaries, $R = \widetildeΩ(\sqrt{d})$ is necessary. (2) In the insertion-only model, a variant of Oja's algorithm gets $o(1)$ error for $R = O(\log n \log d)$. (3) No algorithm with $o(d^2)$ space gets $o(1)$ error for $R = O(1)$. Our analysis is the first application of Oja's algorithm to adversarial streams. It is also the first algorithm for adversarial streaming PCA that is designed for a spectral, rather than Frobenius, bound on the tail; and the bound it needs is exponentially better than is possible by adapting a Frobenius guarantee.