Antonia Karra

h-index50
2papers

2 Papers

43.9LGJun 3
UniFair: A unified fair clustering approach based on separation and compactness

Antonia Karra, Vasiliki Papanikou, Georgios Vardakas et al.

Clustering is increasingly used to support high-impact decisions, yet standard objectives such as $k$-means can produce clusterings that treat demographic groups unequally. Existing fair clustering methods typically optimize a single notion of fairness and often overlook how clustering costs interact with the geometry of the induced decision boundaries. We propose \textsc{UniFair}, a unified framework that jointly optimizes \emph{separation fairness} and \emph{social fairness}. Separation fairness encourages protected groups to lie farther from the induced decision boundaries, while social fairness reduces disparities in within-cluster distortion by penalizing group-wise clustering costs. We develop gradient-based optimization procedures for separation-fair and unified $k$-means objectives, and extend them to deep clustering by enforcing the same criteria in the latent space of an autoencoder. Experiments on tabular and image datasets show that \textsc{UniFair} reduces both boundary-related and cost-based group disparities with only a modest increase in clustering loss.

LGJan 17, 2025
Counterfactual Explanations for k-means and Gaussian Clustering

Georgios Vardakas, Antonia Karra, Evaggelia Pitoura et al.

Counterfactuals have been recognized as an effective approach to explain classifier decisions. Nevertheless, they have not yet been considered in the context of clustering. In this work, we propose the use of counterfactuals to explain clustering solutions. First, we present a general definition for counterfactuals for model-based clustering that includes plausibility and feasibility constraints. Then we consider the counterfactual generation problem for k-means and Gaussian clustering assuming Euclidean distance. Our approach takes as input the factual, the target cluster, a binary mask indicating actionable or immutable features and a plausibility factor specifying how far from the cluster boundary the counterfactual should be placed. In the k-means clustering case, analytical mathematical formulas are presented for computing the optimal solution, while in the Gaussian clustering case (assuming full, diagonal, or spherical covariances) our method requires the numerical solution of a nonlinear equation with a single parameter only. We demonstrate the advantages of our approach through illustrative examples and quantitative experimental comparisons.