ML CY LGOct 28, 2023

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun

arXiv:2310.18593v19.810 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This work addresses fairness in PCA for streaming data, offering a novel theoretical and algorithmic framework that is incremental but fills key gaps in the literature.

The paper tackles the lack of statistical foundations and memory inefficiency in fair PCA by introducing a new learnability notion (PAFO) and a streaming algorithm (FNPM), achieving theoretical guarantees and demonstrating efficacy on real-world datasets.

Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.

View on arXiv PDF Code

Similar