MLLGMay 17, 2019

Online Distributed Estimation of Principal Eigenspaces

arXiv:1905.07389v1
Originality Incremental advance
AI Analysis

This work addresses the need for efficient large-scale data processing in machine learning, though it appears incremental as it builds on existing distributed PCA approaches.

The paper tackles the problem of performing principal component analysis (PCA) in an online and distributed setting, proposing an algorithm that achieves a substantial computational speed-up compared to standard distributed PCA methods while maintaining learning accuracy.

Principal components analysis (PCA) is a widely used dimension reduction technique with an extensive range of applications. In this paper, an online distributed algorithm is proposed for recovering the principal eigenspaces. We further establish its rate of convergence and show how it relates to the number of nodes employed in the distributed computation, the effective rank of the data matrix under consideration, and the gap in the spectrum of the underlying population covariance matrix. The proposed algorithm is illustrated on low-rank approximation and $\boldsymbol{k}$-means clustering tasks. The numerical results show a substantial computational speed-up vis-a-vis standard distributed PCA algorithms, without compromising learning accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes