LGMay 28, 2022
Fair Labeled ClusteringSeyed A. Esmaeili, Sharmila Duppala, John P. Dickerson et al.
Numerous algorithms have been produced for the fundamental problem of clustering under many different notions of fairness. Perhaps the most common family of notions currently studied is group fairness, in which proportional group representation is ensured in every cluster. We extend this direction by considering the downstream application of clustering and how group fairness should be ensured for such a setting. Specifically, we consider a common setting in which a decision-maker runs a clustering algorithm, inspects the center of each cluster, and decides an appropriate outcome (label) for its corresponding cluster. In hiring for example, there could be two outcomes, positive (hire) or negative (reject), and each cluster would be assigned one of these two outcomes. To ensure group fairness in such a setting, we would desire proportional group representation in every label but not necessarily in every cluster as is done in group fair clustering. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space. We show that this setting exhibits interesting transitions from computationally hard to easy according to additional constraints on the problem. Moreover, when the constraint parameters take on natural values we show a randomized algorithm for this setting that always achieves an optimal clustering and satisfies the fairness constraints in expectation. Finally, we run experiments on real world datasets that validate the effectiveness of our algorithms.
GTMar 2, 2022
Implications of Distance over Redistricting Maps: Central and Outlier MapsSeyed A. Esmaeili, Darshan Chakrabarti, Hayley Grape et al.
In representative democracy, a redistricting map is chosen to partition an electorate into districts which each elects a representative. A valid redistricting map must satisfy a collection of constraints such as being compact, contiguous, and of almost-equal population. However, these constraints are loose enough to enable an enormous ensemble of valid redistricting maps. This enables a partisan legislature to gerrymander by choosing a map which unfairly favors it. In this paper, we introduce an interpretable and tractable distance measure over redistricting maps which does not use election results and study its implications over the ensemble of redistricting maps. Specifically, we define a central map which may be considered "most typical" and give a rigorous justification for it by showing that it mirrors the Kemeny ranking in a scenario where we have a committee voting over a collection of redistricting maps to be drawn. We include running time and sample complexity analysis for our algorithms, including some negative results which hold using any algorithm. We further study outlier detection based on this distance measure and show that our framework can detect some gerrymandered maps. More precisely, we show some maps that are widely considered to be gerrymandered that lie very far away from our central maps in comparison to a large ensemble of valid redistricting maps. Since our distance measure does not rely on election results, this gives a significant advantage in gerrymandering detection which is lacking in all previous methods.
GTDec 13, 2023
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic AgentsSeyed A. Esmaeili, Suho Shin, Aleksandrs Slivkins
Motivated by applications such as online labor markets we consider a variant of the stochastic multi-armed bandit problem where we have a collection of arms representing strategic agents with different performance characteristics. The platform (principal) chooses an agent in each round to complete a task. Unlike the standard setting, when an arm is pulled it can modify its reward by absorbing it or improving it at the expense of a higher cost. The principle has to solve a mechanism design problem to incentivize the arms to give their best performance. However, since even with an effective mechanism agents may still deviate from rational behavior, the principal wants a robust algorithm that also gives a non-vacuous guarantee on the total accumulated rewards under non-equilibrium behavior. In this paper, we introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We do this by identifying a collection of intuitive properties that a bandit algorithm has to satisfy to achieve these objectives. Finally, we show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
LGAug 14, 2025
Welfare-Centric ClusteringClaire Jie Zhang, Seyed A. Esmaeili, Jamie Morgenstern
Fair clustering has traditionally focused on ensuring equitable group representation or equalizing group-specific clustering costs. However, Dickerson et al. (2025) recently showed that these fairness notions may yield undesirable or unintuitive clustering outcomes and advocated for a welfare-centric clustering approach that models the utilities of the groups. In this work, we model group utilities based on both distances and proportional representation and formalize two optimization objectives based on welfare-centric clustering: the Rawlsian (Egalitarian) objective and the Utilitarian objective. We introduce novel algorithms for both objectives and prove theoretical guarantees for them. Empirical evaluations on multiple real-world datasets demonstrate that our methods significantly outperform existing fair clustering baselines.
LGJun 22, 2024
Fair Clustering: Critique, Caveats, and Future DirectionsJohn Dickerson, Seyed A. Esmaeili, Jamie Morgenstern et al.
Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature on fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms. In this paper, we take a critical view of fair clustering, identifying a collection of ignored issues such as the lack of a clear utility characterization and the difficulty in accounting for the downstream effects of a fair clustering algorithm in machine learning settings. In some cases, we demonstrate examples where the application of a fair clustering algorithm can have significant negative impacts on social welfare. We end by identifying a collection of steps that would lead towards more impactful research in fair clustering.
GTJun 7, 2024
How to Strategize Human Content Creation in the Era of GenAI?Seyed A. Esmaeili, Kevin Lim, Kshipra Bhawalkar et al.
Generative AI (GenAI) will have significant impact on content creation platforms. In this paper, we study the dynamic competition between a GenAI and a human contributor. Unlike the human, the GenAI's content only improves when more contents are created by the human over time; however, GenAI has the advantage of generating content at a lower cost. We study the algorithmic problem in this dynamic competition model about how the human contributor can maximize her utility when competing against the GenAI for content generation over a set of topics. In time-sensitive content domains (e.g., news or pop music creation) where contents' value diminishes over time, we show that there is no polynomial time algorithm for finding the human's optimal (dynamic) strategy, unless the randomized exponential time hypothesis is false. Fortunately, we are able to design a polynomial time algorithm that naturally cycles between myopically optimizing over a short time window and pausing and provably guarantees an approximation ratio of $\frac{1}{2}$. We then turn to time-insensitive content domains where contents do not lose their value (e.g., contents on history facts). Interestingly, we show that this setting permits a polynomial time algorithm that maximizes the human's utility in the long run. Finally, we conduct simulations that demonstrate the advantage of our algorithms in comparison to a collection of baselines.
LGJun 2, 2024
Robust Fair Clustering with Group Membership Uncertainty SetsSharmila Duppala, Juan Luque, John P. Dickerson et al.
We study the canonical fair clustering problem where each cluster is constrained to have close to population-level representation of each group. Despite significant attention, the salient issue of having incomplete knowledge about the group membership of each point has been superficially addressed. In this paper, we consider a setting where the assigned group memberships are noisy. We introduce a simple noise model that requires a small number of parameters to be given by the decision maker. We then present an algorithm for fair clustering with provable \emph{robustness} guarantees. Our framework enables the decision maker to trade off between the robustness and the clustering quality. Unlike previous work, our algorithms are backed by worst-case theoretical guarantees. Finally, we empirically verify the performance of our algorithm on real world datasets and show its superior performance over existing baselines.
GTDec 28, 2023
Replication-proof Bandit Mechanism Design with Bayesian AgentsSuho Shin, Seyed A. Esmaeili, MohammadTaghi Hajiaghayi
We study the problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms to maximize their payoff. Specifically, we consider Bayesian agents who only know the distribution from which their own arms' mean rewards are sampled, unlike the original setting of by Shin et al. 2022. Interestingly, with Bayesian agents in stark contrast to the previous work, analyzing the replication-proofness of an algorithm becomes significantly complicated even in a single-agent setting. We provide sufficient and necessary conditions for an algorithm to be replication-proof in the single-agent setting, and present an algorithm that satisfies these properties. These results center around several analytical theorems that focus on \emph{comparing the expected regret of multiple bandit instances}, and therefore might be of independent interest since they have not been studied before to the best of our knowledge. We expand this result to the multi-agent setting, and provide a replication-proof algorithm for any problem instance. We finalize our result by proving its sublinear regret upper bound which matches that of Shin et al. 2022.
LGMay 31, 2023
Doubly Constrained Fair ClusteringJohn Dickerson, Seyed A. Esmaeili, Jamie Morgenstern et al.
The remarkable attention which fair clustering has received in the last few years has resulted in a significant number of different notions of fairness. Despite the fact that these notions are well-justified, they are often motivated and studied in a disjoint manner where one fairness desideratum is considered exclusively in isolation from the others. This leaves the understanding of the relations between different fairness notions as an important open problem in fair clustering. In this paper, we take the first step in this direction. Specifically, we consider the two most prominent demographic representation fairness notions in clustering: (1) Group Fairness (GF), where the different demographic groups are supposed to have close to population-level representation in each cluster and (2) Diversity in Center Selection (DS), where the selected centers are supposed to have close to population-level representation of each group. We show that given a constant approximation algorithm for one constraint (GF or DS only) we can obtain a constant approximation solution that satisfies both constraints simultaneously. Interestingly, we prove that any given solution that satisfies the GF constraint can always be post-processed at a bounded degradation to the clustering cost to additionally satisfy the DS constraint while the reverse is not true. Furthermore, we show that both GF and DS are incompatible (having an empty feasibility set in the worst case) with a collection of other distance-based fairness notions. Finally, we carry experiments to validate our theoretical findings.
GTJan 16, 2022
Rawlsian Fairness in Online Bipartite Matching: Two-sided, Group, and IndividualSeyed A. Esmaeili, Sharmila Duppala, Davidson Cheng et al.
Online bipartite-matching platforms are ubiquitous and find applications in important areas such as crowdsourcing and ridesharing. In the most general form, the platform consists of three entities: two sides to be matched and a platform operator that decides the matching. The design of algorithms for such platforms has traditionally focused on the operator's (expected) profit. Since fairness has become an important consideration that was ignored in the existing algorithms a collection of online matching algorithms have been developed that give a fair treatment guarantee for one side of the market at the expense of a drop in the operator's profit. In this paper, we generalize the existing work to offer fair treatment guarantees to both sides of the market simultaneously, at a calculated worst case drop to operator profit. We consider group and individual Rawlsian fairness criteria. Moreover, our algorithms have theoretical guarantees and have adjustable parameters that can be tuned as desired to balance the trade-off between the utilities of the three sides. We also derive hardness results that give clear upper bounds over the performance of any algorithm.
LGJun 14, 2021
Fair Clustering Under a Bounded CostSeyed A. Esmaeili, Brian Brubach, Aravind Srinivasan et al.
Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space. A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness. In this model, the cost of the clustering objective increases due to enforcing fairness in the algorithm. The relative increase in the cost, the ''price of fairness,'' can indeed be unbounded. Therefore, in this paper we propose to treat an upper bound on the clustering objective as a constraint on the clustering problem, and to maximize equality of representation subject to it. We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective. We derive fundamental lower bounds on the approximation of the utilitarian and egalitarian objectives and introduce algorithms with provable guarantees for them. For the leximin objective we introduce an effective heuristic algorithm. We further derive impossibility results for other natural fairness objectives. We conclude with experimental results on real-world datasets that demonstrate the validity of our algorithms.
LGJun 9, 2021
A New Notion of Individually Fair Clustering: $α$-Equitable $k$-CenterDarshan Chakrabarti, John P. Dickerson, Seyed A. Esmaeili et al.
Clustering is a fundamental problem in unsupervised machine learning, and fair variants of it have recently received significant attention due to its societal implications. In this work we introduce a novel definition of individual fairness for clustering problems. Specifically, in our model, each point $j$ has a set of other points $\mathcal{S}_j$ that it perceives as similar to itself, and it feels that it is fairly treated if the quality of service it receives in the solution is $α$-close (in a multiplicative sense, for a given $α\geq 1$) to that of the points in $\mathcal{S}_j$. We begin our study by answering questions regarding the structure of the problem, namely for what values of $α$ the problem is well-defined, and what the behavior of the \emph{Price of Fairness (PoF)} for it is. For the well-defined region of $α$, we provide efficient and easily-implementable approximation algorithms for the $k$-center objective, which in certain cases enjoy bounded-PoF guarantees. We finally complement our analysis by an extensive suite of experiments that validates the effectiveness of our theoretical results.
LGJun 19, 2020
Probabilistic Fair ClusteringSeyed A. Esmaeili, Brian Brubach, Leonidas Tsepenekas et al.
In clustering problems, a central decision-maker is given a complete metric graph over vertices and must provide a clustering of vertices that minimizes some objective function. In fair clustering problems, vertices are endowed with a color (e.g., membership in a group), and the features of a valid clustering might also include the representation of colors in that clustering. Prior work in fair clustering assumes complete knowledge of group membership. In this paper, we generalize prior work by assuming imperfect knowledge of group membership through probabilistic assignments. We present clustering algorithms in this more general setting with approximation ratio guarantees. We also address the problem of "metric membership", where different groups have a notion of order and distance. Experiments are conducted using our proposed algorithms as well as baselines to validate our approach and also surface nuanced concerns when group membership is not known deterministically.
MLMay 25, 2018
An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral AlgorithmChristopher DeCarolis, Mukul Ram, Seyed A. Esmaeili et al.
We provide an end-to-end differentially private spectral algorithm for learning LDA, based on matrix/tensor decompositions, and establish theoretical guarantees on utility/consistency of the estimated model parameters. The spectral algorithm consists of multiple algorithmic steps, named as "{edges}", to which noise could be injected to obtain differential privacy. We identify \emph{subsets of edges}, named as "{configurations}", such that adding noise to all edges in such a subset guarantees differential privacy of the end-to-end spectral algorithm. We characterize the sensitivity of the edges with respect to the input and thus estimate the amount of noise to be added to each edge for any required privacy level. We then characterize the utility loss for each configuration as a function of injected noise. Overall, by combining the sensitivity and utility characterization, we obtain an end-to-end differentially private spectral algorithm for LDA and identify the corresponding configuration that outperforms others in any specific regime. We are the first to achieve utility guarantees under the required level of differential privacy for learning in LDA. Overall our method systematically outperforms differentially private variational inference.
CVDec 14, 2016
Fast-AT: Fast Automatic Thumbnail Generation using Deep Neural NetworksSeyed A. Esmaeili, Bharat Singh, Larry S. Davis
Fast-AT is an automatic thumbnail generation system based on deep neural networks. It is a fully-convolutional deep neural network, which learns specific filters for thumbnails of different sizes and aspect ratios. During inference, the appropriate filter is selected depending on the dimensions of the target thumbnail. Unlike most previous work, Fast-AT does not utilize saliency but addresses the problem directly. In addition, it eliminates the need to conduct region search on the saliency map. The model generalizes to thumbnails of different sizes including those with extreme aspect ratios and can generate thumbnails in real time. A data set of more than 70,000 thumbnail annotations was collected to train Fast-AT. We show competitive results in comparison to existing techniques.