Streaming k-PCA: Efficient guarantees for Oja's algorithm, beyond rank-one updates
This work provides theoretical guarantees for the efficiency of Oja's algorithm for streaming k-PCA, which is important for researchers and practitioners working with high-dimensional streaming data.
This paper analyzes Oja's algorithm for streaming k-PCA, demonstrating that it can approximate the subspace of the top k eigenvectors of the expectation of i.i.d. symmetric matrices with a polylogarithmic dependence on dimension d for the number of samples. This extends previous results which were limited to rank-one updates.
We analyze Oja's algorithm for streaming $k$-PCA and prove that it achieves performance nearly matching that of an optimal offline algorithm. Given access to a sequence of i.i.d. $d \times d$ symmetric matrices, we show that Oja's algorithm can obtain an accurate approximation to the subspace of the top $k$ eigenvectors of their expectation using a number of samples that scales polylogarithmically with $d$. Previously, such a result was only known in the case where the updates have rank one. Our analysis is based on recently developed matrix concentration tools, which allow us to prove strong bounds on the tails of the random matrices which arise in the course of the algorithm's execution.