LG AI DSMay 28, 2022

Fair Labeled Clustering

Seyed A. Esmaeili, Sharmila Duppala, John P. Dickerson, Brian Brubach

arXiv:2205.14358v25.86 citationsh-index: 38

Originality Incremental advance

AI Analysis

This work addresses fairness in clustering for decision-making applications, offering efficient solutions for a specific downstream scenario.

The paper tackles the problem of ensuring group fairness in clustering when clusters are assigned labels for downstream decisions, such as hiring, and shows that this setting permits efficient algorithms unlike NP-hard group fair clustering problems.

Numerous algorithms have been produced for the fundamental problem of clustering under many different notions of fairness. Perhaps the most common family of notions currently studied is group fairness, in which proportional group representation is ensured in every cluster. We extend this direction by considering the downstream application of clustering and how group fairness should be ensured for such a setting. Specifically, we consider a common setting in which a decision-maker runs a clustering algorithm, inspects the center of each cluster, and decides an appropriate outcome (label) for its corresponding cluster. In hiring for example, there could be two outcomes, positive (hire) or negative (reject), and each cluster would be assigned one of these two outcomes. To ensure group fairness in such a setting, we would desire proportional group representation in every label but not necessarily in every cluster as is done in group fair clustering. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space. We show that this setting exhibits interesting transitions from computationally hard to easy according to additional constraints on the problem. Moreover, when the constraint parameters take on natural values we show a randomized algorithm for this setting that always achieves an optimal clustering and satisfies the fairness constraints in expectation. Finally, we run experiments on real world datasets that validate the effectiveness of our algorithms.

View on arXiv PDF

Similar