Diptarka Chakraborty

DS
Semantic Scholar Profile
h-index17
6papers
5citations
Novelty63%
AI Score53

6 Papers

DSMay 28
Explaining Rankings with Hidden Group Bonuses

Alvin Hong Yao Yan, Suraj Shetiya, Sujoy Bhore et al.

Determining a linear utility function that correlates with observed candidate rankings is a foundational problem with applications in domains such as admissions, hiring, and recommendation systems, e.g., [Storandt and Funke, AAAI'19, Zhang et al., KDD'23, Wang et al., ICDE'24 (best paper award), Chen and Wong, VLDB'24]. Traditionally, these models assume full visibility into the feature sets used to determine the utility score. However, real-world scenarios often involve sensitive attributes that are hidden or partially observed, yet may influence outcomes through additive bonuses designed to promote fairness, as in [Gale and Marian, ICDE'24]. Motivated by such practical concerns, we study a variant of the ranking explanation problem where sensitive features are unobserved but may influence candidate rankings through group-specific linear boosts. We present a formal framework for modeling this problem and develop an algorithmic solution that leverages constraint satisfaction and automated reasoning techniques to jointly infer the linear scoring parameters and latent group bonuses consistent with the observed rankings. We further show that determining a satisfying linear function with group-specific bonuses is \textsf{NP}-hard in general, but when the feature dimension and the number of groups are constant, the problem admits a polynomial-time solution. Our approach is the first to address this nuanced variant, which captures key real-world challenges in fair ranking and admission systems. We perform extensive experiments on both real-world and synthetic datasets, demonstrating that our method effectively recovers hidden bonus structures and provides faithful explanations of observed ranking outcomes.

DSMay 22
Fairness in Aggregation: Optimal Top-$k$ and Improved Full Ranking

Diptarka Chakraborty, Arya Mazumdar, Barna Saha et al.

Ensuring fairness in algorithmic ranking systems is a critical challenge with significant societal implications for hiring, recommendations, web search, and data management. Standard methods for aggregating multiple preference orders into a consensus ranking may perpetuate and even amplify the lack of representation of underrepresented groups. To address this, recent research has focused on incorporating fairness constraints to ensure the presence of different groups in the top-$k$ positions of the final aggregate ranking. We study two fairness-aware variants under the well-known Spearman footrule, which corresponds to the $L_1$ distance between rankings. First, we address the practically salient task of computing a fair aggregate top-$k$ ranking -- crucial in settings like recommendations and hiring where selection is primarily based on the top-$k$ results -- and present the first optimal algorithm for this problem. Second, we consider fair (full) rank aggregation over all candidates (not specifically on top-$k$). We already know of a $3$-approximation for this fair rank aggregation variant (Wei et al., SIGMOD'22; Chakraborty et al., NeurIPS'22), whereas an exact algorithm exists for the corresponding unconstrained (unfair) version (Dwork et al., WWW'01). Closing the computational gap between fair and unconstrained rank aggregation has remained a tantalizing open problem. We make significant progress by giving a $2$-approximation algorithm for fair (full) rank aggregation, improving substantially over the previous $3$-approximation. Further, we complement our theoretical contributions with experiments on different real-world datasets, which corroborate our theoretical results and demonstrate strong empirical performance relative to state-of-the-art baselines.

DSDec 22, 2025
Clustering with Label Consistency

Diptarka Chakraborty, Hendrik Fichtenberger, Bernhard Haeupler et al.

Designing efficient, effective, and consistent metric clustering algorithms is a significant challenge attracting growing attention. Traditional approaches focus on the stability of cluster centers; unfortunately, this neglects the real-world need for stable point labels, i.e., stable assignments of points to named sets (clusters). In this paper, we address this gap by initiating the study of label-consistent metric clustering. We first introduce a new notion of consistency, measuring the label distance between two consecutive solutions. Then, armed with this new definition, we design new consistent approximation algorithms for the classical $k$-center and $k$-median problems.

LGFeb 12
A Generic Framework for Fair Consensus Clustering in Streams

Diptarka Chakraborty, Kushagra Chatterjee, Debarati Das et al.

Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.

LGNov 14, 2025
Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications

Diptarka Chakraborty, Kushagra Chatterjee, Debarati Das et al.

Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of \emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the \emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25]. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the \emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the \emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].

LGJun 10, 2025
Towards Fair Representation: Clustering and Consensus

Diptarka Chakraborty, Kushagra Chatterjee, Debarati Das et al.

Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best represents the collective structure of the data. In this work, we study this fundamental problem through the lens of fair clustering, as introduced by Chierichetti et al. [NeurIPS'17], which incorporates the disparate impact doctrine to ensure proportional representation of each protected group in the dataset within every cluster. Our objective is to find a consensus clustering that is not only representative but also fair with respect to specific protected attributes. To the best of our knowledge, we are the first to address this problem and provide a constant-factor approximation. As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness -- an essential postprocessing step in many clustering applications that require fair representation. We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms for more general scenarios with different proportions of two group sizes. We complement our approximation result by showing that the problem is NP-hard for two unequal-sized groups. Given the fundamental nature of this problem, we believe our results on Closest Fair Clustering could have broader implications for other clustering problems, particularly those for which no prior approximation guarantees exist for their fair variants.