Fair Clustering with Minimum Representation Constraints
This addresses fairness in clustering for applications like electoral districts or playlists, where groups need representation to benefit, though it is incremental as it builds on existing clustering methods with added constraints.
The paper tackles the problem of ensuring each demographic group achieves a minimum representation in at least a specified number of clusters in k-means and k-medians clustering, proposing the MiniReL algorithm that yields fair clusters without increasing clustering cost on benchmark datasets.
Clustering is a well-studied unsupervised learning task that aims to partition data points into a number of clusters. In many applications, these clusters correspond to real-world constructs (e.g., electoral districts, playlists, TV channels), where a group (e.g., social or demographic) benefits only if it reaches a minimum level of representation in the cluster (e.g., 50% to elect their preferred candidate). In this paper, we study the k-means and k-medians clustering problems under the additional fairness constraint that each group must attain a minimum level of representation in at least a specified number of clusters. We formulate this problem as a mixed-integer (nonlinear) optimization problem and propose an alternating minimization algorithm, called MiniReL, to solve it. Although incorporating fairness constraints results in an NP-hard assignment problem within the MiniReL algorithm, we present several heuristic strategies that make the approach practical even for large datasets. Numerical results demonstrate that our method yields fair clusters without increasing clustering cost across standard benchmark datasets.