LGCYMay 29, 2025

A Computational Approach to Improving Fairness in K-means Clustering

arXiv:2505.22984v2h-index: 2
Originality Incremental advance
AI Analysis

This work addresses bias in clustering algorithms for applications involving sensitive data, representing an incremental improvement with domain-specific relevance.

The paper tackles fairness issues in K-means clustering, where clusters may be biased by sensitive attributes like gender or race, by proposing a two-stage optimization method that adjusts cluster memberships for selected data points, resulting in substantial fairness improvements with minimal impact on clustering quality.

The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation. Some cluster may have disproportionately more (or fewer) points from one of the subpopulations in terms of some sensitive variable, e.g., gender or race. Such a fairness issue may cause bias and unexpected social consequences. This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation--clustering first and then adjust cluster membership of a small subset of selected data points. Two computationally efficient algorithms are proposed in identifying those data points that are expensive for fairness, with one focusing on nearest data points outside of a cluster and the other on highly 'mixed' data points. Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality. The proposed algorithms can be easily extended to a broad class of clustering algorithms or fairness metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes