LGFeb 12
A Generic Framework for Fair Consensus Clustering in StreamsDiptarka Chakraborty, Kushagra Chatterjee, Debarati Das et al.
Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.
LGNov 14, 2025
Generalizing Fair Clustering to Multiple Groups: Algorithms and ApplicationsDiptarka Chakraborty, Kushagra Chatterjee, Debarati Das et al.
Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of \emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the \emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25]. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the \emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the \emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].
DSMay 10
A Scalable and Unified Framework to Weighted Rank AggregationAmir Carmel, Debarati Das, Tien-Long Nguyen
The rank aggregation problem seeks to combine multiple rank orderings of the same set of candidates into a single consensus ordering. Such problems arise in diverse domains, including web search, employment, college admissions, and voting. In this work we focus on the 1-median objective: given a set of m rankings over [n], the goal is to compute a ranking that minimizes the sum of its distances to all input rankings. We study rank aggregation under several classical distance metrics: Ulam distance, Spearman's footrule, Hamming distance, and Kendall-tau, as well as their weighted variants. Our contributions begin with a novel unified framework that identifies a key structural property: it suffices to focus on a small subset of rankings, where the corresponding local one-median provides a good approximation to the global median. This principle extends across these distance measures, yielding a general algorithmic framework for weighted rank aggregation. Building on this, we present a new approximation algorithm for rank aggregation under the Ulam distance that scales in the Massively Parallel Computation (MPC) model. Our algorithm computes a $(2-α)$-approximation, for a constant $α>0$, to the 1-median in a constant number of rounds, using local memory sublinear in n and total memory near-linear in n. We further design new MPC approximation algorithms for Spearman's footrule and for the element-weighted variants of Hamming and Kendall-tau distances. For each metric, we obtain a $(2-ζ)$-approximation, for a constant $ζ>0$, to the 1-median in a constant number of rounds, using local memory sublinear in n and total memory linear or near-linear in n. Moreover, for the Ulam distance, we simplify and strengthen the analysis of Chakraborty et al., obtaining an improved 1.968-approximation that further extends to the weighted setting.