LG DS MLDec 8, 2024

On Socially Fair Low-Rank Approximation and Column Subset Selection

Zhao Song, Ali Vakilian, David P. Woodruff, Samson Zhou

arXiv:2412.06063v16.45 citationsh-index: 13NIPS

Originality Incremental advance

AI Analysis

This addresses fairness in fundamental ML problems for applications with diverse data sub-populations, but the results are incremental as they focus on complexity and algorithmic improvements rather than a breakthrough.

The paper tackles the problem of socially fair low-rank approximation and column subset selection by minimizing loss across all data sub-populations, showing that constant-factor approximation requires exponential time under standard complexity hypotheses, but provides algorithms with improved runtime (e.g., 2^poly(k) for constant groups) and polynomial-time bicriteria approximations.

Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications. In this paper, we study the question of socially fair low-rank approximation and socially fair column subset selection, where the goal is to minimize the loss over all sub-populations of the data. We show that surprisingly, even constant-factor approximation to fair low-rank approximation requires exponential time under certain standard complexity hypotheses. On the positive side, we give an algorithm for fair low-rank approximation that, for a constant number of groups and constant-factor accuracy, runs in $2^{\text{poly}(k)}$ time rather than the naïve $n^{\text{poly}(k)}$, which is a substantial improvement when the dataset has a large number $n$ of observations. We then show that there exist bicriteria approximation algorithms for fair low-rank approximation and fair column subset selection that run in polynomial time.

View on arXiv PDF

Similar