The Price of Fair PCA: One Extra Dimension
This addresses fairness issues in machine learning for affected populations, offering a novel algorithmic solution to reduce bias in dimensionality reduction.
The paper tackles the problem of PCA producing biased representations with higher reconstruction error for certain populations, such as women versus men, and proposes Fair PCA to maintain similar fidelity across groups, showing it can efficiently generate fair low-dimensional representations on real-world datasets.
We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.