Proportional Fairness in Non-Centroid Clustering
This work addresses fairness in clustering for data science applications, but it is incremental as it builds on prior centroid-based fairness frameworks.
The paper tackles the problem of extending proportional fairness guarantees to non-centroid clustering, where an agent's loss depends on other agents in its cluster, by adapting fairness criteria like the core and fully justified representation (FJR). It shows that achieving the core is challenging, but designs algorithms like GreedyCohesiveClustering for exact FJR and GreedyCapture for a constant approximation, with experiments indicating GreedyCapture improves fairness significantly while maintaining clustering quality.
We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees that become stronger for groups of data points (agents) that are large and cohesive. Prior work applies this framework to centroid clustering, where the loss of an agent is its distance to the centroid assigned to its cluster. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster, by adapting two proportional fairness criteria -- the core and its relaxation, fully justified representation (FJR) -- to this setting. We show that the core can be approximated only under structured loss functions, and even then, the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm developed for centroid clustering [Chen et al., 2019; Micha and Shah, 2020], is unappealing for a natural loss function. In contrast, we design a new (inefficient) algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and show that the efficient GreedyCapture algorithm achieves a constant approximation of FJR. We also design an efficient auditing algorithm, which estimates the FJR approximation of any given clustering solution up to a constant factor. Our experiments on real data suggest that traditional clustering algorithms are highly unfair, whereas GreedyCapture is considerably fairer and incurs only a modest loss in common clustering objectives.