Provable Model-Parallel Distributed Principal Component Analysis with Parallel Deflation
This work addresses a theoretical gap in distributed PCA for scalable machine learning applications, though it is incremental as it builds on existing deflation methods and focuses on theoretical analysis rather than novel practical gains.
The paper tackles the problem of distributed Principal Component Analysis (PCA) by proposing a model-parallel framework where workers asynchronously update distinct eigenvectors using hierarchical peer updates, breaking sequential dependencies in deflation steps. The result is a provably convergent algorithm with low communication cost, demonstrated through experiments to achieve performance comparable to the state-of-the-art EigenGame-μ solver.
We study a distributed Principal Component Analysis (PCA) framework where each worker targets a distinct eigenvector and refines its solution by updating from intermediate solutions provided by peers deemed as "superior". Drawing intuition from the deflation method in centralized eigenvalue problems, our approach breaks the sequential dependency in the deflation steps and allows asynchronous updates of workers, while incurring only a small communication cost. To our knowledge, a gap in the literature -- the theoretical underpinning of such distributed, dynamic interactions among workers -- has remained unaddressed. This paper offers a theoretical analysis explaining why, how, and when these intermediate, hierarchical updates lead to practical and provable convergence in distributed environments. Despite being a theoretical work, our prototype implementation demonstrates that such a distributed PCA algorithm converges effectively and in scalable way: through experiments, our proposed framework offers comparable performance to EigenGame-$μ$, the state-of-the-art model-parallel PCA solver.