LGDCSPOCAug 27, 2021

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

arXiv:2108.12373v221 citations
Originality Incremental advance
AI Analysis

This work solves the challenge of distributed PCA for large-scale datasets, offering an exact and communication-efficient solution, though it is incremental relative to prior distributed PCA methods.

The paper tackles the problem of performing Principal Component Analysis (PCA) on data distributed across a network, addressing inefficiencies and lack of convergence guarantees in existing methods. It proposes FAST-PCA, which achieves linear and exact convergence to principal components, reducing dimensions and learning uncorrelated features efficiently.

Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes