LG AI CG MLJun 17, 2020

Socially Fair k-Means Clustering

Mehrdad Ghadiri, Samira Samadi, Santosh Vempala

arXiv:2006.10085v220.220 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses fairness issues in human-centric applications like resource allocation, offering a viable fair alternative to k-means, though it is incremental as it modifies an existing heuristic.

The paper tackled the problem that standard k-means clustering can produce biased outcomes against subgroups, such as demographic groups, by proposing a fair variant called Fair-Lloyd, which ensures equal costs for all groups with negligible runtime increase on benchmark datasets.

We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.

View on arXiv PDF Code

Similar