MLCYLGOct 28, 2023

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

arXiv:2310.18593v110 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses fairness in PCA for streaming data, offering a novel theoretical and algorithmic framework that is incremental but fills key gaps in the literature.

The paper tackles the lack of statistical foundations and memory inefficiency in fair PCA by introducing a new learnability notion (PAFO) and a streaming algorithm (FNPM), achieving theoretical guarantees and demonstrating efficacy on real-world datasets.

Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes