DSLGPRFeb 6, 2021

Streaming k-PCA: Efficient guarantees for Oja's algorithm, beyond rank-one updates

arXiv:2102.03646v124 citations
Originality Incremental advance
AI Analysis

This work provides theoretical guarantees for the efficiency of Oja's algorithm for streaming k-PCA, which is important for researchers and practitioners working with high-dimensional streaming data.

This paper analyzes Oja's algorithm for streaming k-PCA, demonstrating that it can approximate the subspace of the top k eigenvectors of the expectation of i.i.d. symmetric matrices with a polylogarithmic dependence on dimension d for the number of samples. This extends previous results which were limited to rank-one updates.

We analyze Oja's algorithm for streaming $k$-PCA and prove that it achieves performance nearly matching that of an optimal offline algorithm. Given access to a sequence of i.i.d. $d \times d$ symmetric matrices, we show that Oja's algorithm can obtain an accurate approximation to the subspace of the top $k$ eigenvectors of their expectation using a number of samples that scales polylogarithmically with $d$. Previously, such a result was only known in the case where the updates have rank one. Our analysis is based on recently developed matrix concentration tools, which allow us to prove strong bounds on the tails of the random matrices which arise in the course of the algorithm's execution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes