ML DS LGJan 24, 2019

Fair k-Center Clustering for Data Summarization

Matthäus Kleindessner, Pranjal Awasthi, Jamie Morgenstern

arXiv:1901.08628v226.2189 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides a more efficient solution for fair data summarization, which is incremental as it improves runtime over prior methods.

The paper tackles the problem of fair k-center clustering for data summarization by developing an approximation algorithm that runs in linear time, addressing the gap where existing methods had super-quadratic runtime, with an approximation guarantee incurring only a constant-factor overhead when demographic groups are few.

In data summarization we want to choose $k$ prototypes in order to summarize a data set. We study a setting where the data set comprises several demographic groups and we are restricted to choose $k_i$ prototypes belonging to group $i$. A common approach to the problem without the fairness constraint is to optimize a centroid-based clustering objective such as $k$-center. A natural extension then is to incorporate the fairness constraint into the clustering problem. Existing algorithms for doing so run in time super-quadratic in the size of the data set, which is in contrast to the standard $k$-center problem being approximable in linear time. In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$. If the number of demographic groups is small, the approximation guarantee of our algorithm only incurs a constant-factor overhead.

View on arXiv PDF Code

Similar