Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem
This addresses fairness in machine learning for biased datasets with hidden attributes, offering a novel relaxation method with theoretical guarantees, though it is incremental in combining existing concepts.
The paper tackles fair sparse regression on biased datasets with hidden binary attributes by combining sparse regression and clustering, proposing a novel invex relaxation for the combinatorial problem. It achieves exact recovery of the true regression parameter support and hidden attribute values with logarithmic sample complexity in dimension.
In this paper, we study the problem of fair sparse regression on a biased dataset where bias depends upon a hidden binary attribute. The presence of a hidden attribute adds an extra layer of complexity to the problem by combining sparse regression and clustering with unknown binary labels. The corresponding optimization problem is combinatorial, but we propose a novel relaxation of it as an \emph{invex} optimization problem. To the best of our knowledge, this is the first invex relaxation for a combinatorial problem. We show that the inclusion of the debiasing/fairness constraint in our model has no adverse effect on the performance. Rather, it enables the recovery of the hidden attribute. The support of our recovered regression parameter vector matches exactly with the true parameter vector. Moreover, we simultaneously solve the clustering problem by recovering the exact value of the hidden attribute for each sample. Our method uses carefully constructed primal dual witnesses to provide theoretical guarantees for the combinatorial problem. To that end, we show that the sample complexity of our method is logarithmic in terms of the dimension of the regression parameter vector.