Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA
This work addresses the need for efficient entrywise uncertainty quantification in streaming PCA, which is incremental as it builds on existing sin-squared error methods but focuses on a largely unexplored aspect.
The paper tackles the problem of providing uncertainty quantification for individual entries of eigenvectors in streaming PCA, deriving sharp concentration bounds and a Central Limit Theorem that achieve optimal error rates up to logarithmic factors, with numerical experiments showing reliable estimates at a fraction of the computational cost of existing methods.
We propose a novel statistical inference framework for streaming principal component analysis (PCA) using Oja's algorithm, enabling the construction of confidence intervals for individual entries of the estimated eigenvector. Most existing works on streaming PCA focus on providing sharp sin-squared error guarantees. Recently, there has been some interest in uncertainty quantification for the sin-squared error. However, uncertainty quantification or sharp error guarantees for entries of the estimated eigenvector in the streaming setting remains largely unexplored. We derive a sharp Bernstein-type concentration bound for elements of the estimated vector matching the optimal error rate up to logarithmic factors. We also establish a Central Limit Theorem for a suitably centered and scaled subset of the entries. To efficiently estimate the coordinate-wise variance, we introduce a provably consistent subsampling algorithm that leverages the median-of-means approach, empirically achieving similar accuracy to multiplier bootstrap methods while being significantly more computationally efficient. Numerical experiments demonstrate its effectiveness in providing reliable uncertainty estimates with a fraction of the computational cost of existing methods.