LG CYJun 24, 2024

Efficient k-means with Individual Fairness via Exponential Tilting

Shengkun Zhu, Jinshan Zeng, Yuan Sun, Sheng Wang, Xiaodong Li, Zhiyong Peng

arXiv:2406.16557v12.6

Originality Incremental advance

AI Analysis

This work addresses fairness in clustering for resource allocation scenarios, offering a novel method that improves upon existing approaches, though it appears incremental in its adaptation of k-means with fairness constraints.

The paper tackles the problem of achieving individual fairness in clustering for location-based resource allocation by proposing a novel algorithm, tilted k-means (TKM), which integrates exponential tilting into the sum of squared errors to formulate a tilted SSE objective. The results show that TKM outperforms state-of-the-art methods in effectiveness, fairness, and efficiency, with theoretical guarantees on convergence and fairness improvement, and experiments demonstrating superior performance.

In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve individual fairness in clustering. We integrate the exponential tilting into the sum of squared errors (SSE) to formulate a novel objective function called tilted SSE. We demonstrate that the tilted SSE can generalize to SSE and employ the coordinate descent and first-order gradient method for optimization. We propose a novel fairness metric, the variance of the distances within each cluster, which can alleviate the Matthew Effect typically caused by existing fairness metrics. Our theoretical analysis demonstrates that the well-known k-means++ incurs a multiplicative error of O(k log k), and we establish the convergence of TKM under mild conditions. In terms of fairness, we prove that the variance generated by TKM decreases with a scaled hyperparameter. In terms of efficiency, we demonstrate the time complexity is linear with the dataset size. Our experiments demonstrate that TKM outperforms state-of-the-art methods in effectiveness, fairness, and efficiency.

View on arXiv PDF

Similar