CYMay 22, 2022
Addressing Strategic Manipulation Disparities in Fair ClassificationVijay Keswani, L. Elisa Celis
In real-world classification settings, such as loan application evaluation or content moderation on online platforms, individuals respond to classifier predictions by strategically updating their features to increase their likelihood of receiving a particular (positive) decision (at a certain cost). Yet, when different demographic groups have different feature distributions or pay different update costs, prior work has shown that individuals from minority groups often pay a higher cost to update their features. Fair classification aims to address such classifier performance disparities by constraining the classifiers to satisfy statistical fairness properties. However, we show that standard fairness constraints do not guarantee that the constrained classifier reduces the disparity in strategic manipulation cost. To address such biases in strategic settings and provide equal opportunities for strategic manipulation, we propose a constrained optimization framework that constructs classifiers that lower the strategic manipulation cost for minority groups. We develop our framework by studying theoretical connections between group-specific strategic cost disparity and standard selection rate fairness metrics (e.g., statistical rate and true positive rate). Empirically, we show the efficacy of this approach over multiple real-world datasets.
CYAug 5, 2024
On The Stability of Moral Preferences: A Problem with Computational Elicitation MethodsKyle Boerstler, Vijay Keswani, Lok Chan et al.
Preference elicitation frameworks feature heavily in the research on participatory ethical AI tools and provide a viable mechanism to enquire and incorporate the moral values of various stakeholders. As part of the elicitation process, surveys about moral preferences, opinions, and judgments are typically administered only once to each participant. This methodological practice is reasonable if participants' responses are stable over time such that, all other relevant factors being held constant, their responses today will be the same as their responses to the same questions at a later time. However, we do not know how often that is the case. It is possible that participants' true moral preferences change, are subject to temporary moods or whims, or are influenced by environmental factors we don't track. If participants' moral responses are unstable in such ways, it would raise important methodological and theoretical issues for how participants' true moral preferences, opinions, and judgments can be ascertained. We address this possibility here by asking the same survey participants the same moral questions about which patient should receive a kidney when only one is available ten times in ten different sessions over two weeks, varying only presentation order across sessions. We measured how often participants gave different responses to simple (Study One) and more complicated (Study Two) repeated scenarios. On average, the fraction of times participants changed their responses to controversial scenarios was around 10-18% across studies, and this instability is observed to have positive associations with response time and decision-making difficulty. We discuss the implications of these results for the efficacy of moral preference elicitation, highlighting the role of response instability in causing value misalignment between stakeholders and AI tools trained on their moral judgments.
HCJul 26, 2024
On the Pros and Cons of Active Learning for Moral Preference ElicitationVijay Keswani, Vincent Conitzer, Hoda Heidari et al.
Computational preference elicitation methods are tools used to learn people's preferences quantitatively in a given context. Recent works on preference elicitation advocate for active learning as an efficient method to iteratively construct queries (framed as comparisons between context-specific cases) that are likely to be most informative about an agent's underlying preferences. In this work, we argue that the use of active learning for moral preference elicitation relies on certain assumptions about the underlying moral preferences, which can be violated in practice. Specifically, we highlight the following common assumptions (a) preferences are stable over time and not sensitive to the sequence of presented queries, (b) the appropriate hypothesis class is chosen to model moral preferences, and (c) noise in the agent's responses is limited. While these assumptions can be appropriate for preference elicitation in certain domains, prior research on moral psychology suggests they may not be valid for moral judgments. Through a synthetic simulation of preferences that violate the above assumptions, we observe that active learning can have similar or worse performance than a basic random query selection method in certain settings. Yet, simulation results also demonstrate that active learning can still be viable if the degree of instability or noise is relatively small and when the agent's preferences can be approximately represented with the hypothesis class used for learning. Our study highlights the nuances associated with effective moral preference elicitation in practice and advocates for the cautious use of active learning as a methodology to learn moral preferences.
HCNov 13, 2025
Moral Change or Noise? On Problems of Aligning AI With Temporally Unstable Human FeedbackVijay Keswani, Cyrus Cousins, Breanna Nguyen et al.
Alignment methods in moral domains seek to elicit moral preferences of human stakeholders and incorporate them into AI. This presupposes moral preferences as static targets, but such preferences often evolve over time. Proper alignment of AI to dynamic human preferences should ideally account for "legitimate" changes to moral reasoning, while ignoring changes related to attention deficits, cognitive biases, or other arbitrary factors. However, common AI alignment approaches largely neglect temporal changes in preferences, posing serious challenges to proper alignment, especially in high-stakes applications of AI, e.g., in healthcare domains, where misalignment can jeopardize the trustworthiness of the system and yield serious individual and societal harms. This work investigates the extent to which people's moral preferences change over time, and the impact of such changes on AI alignment. Our study is grounded in the kidney allocation domain, where we elicit responses to pairwise comparisons of hypothetical kidney transplant patients from over 400 participants across 3-5 sessions. We find that, on average, participants change their response to the same scenario presented at different times around 6-20% of the time (exhibiting "response instability"). Additionally, we observe significant shifts in several participants' retrofitted decision-making models over time (capturing "model instability"). The predictive performance of simple AI models decreases as a function of both response and model instability. Moreover, predictive performance diminishes over time, highlighting the importance of accounting for temporal changes in preferences during training. These findings raise fundamental normative and technical challenges relevant to AI alignment, highlighting the need to better understand the object of alignment (what to align to) when user preferences change significantly over time.
HCMar 2, 2025
Can AI Model the Complexities of Human Moral Decision-Making? A Qualitative Study of Kidney Allocation DecisionsVijay Keswani, Vincent Conitzer, Walter Sinnott-Armstrong et al.
A growing body of work in Ethical AI attempts to capture human moral judgments through simple computational models. The key question we address in this work is whether such simple AI models capture {the critical} nuances of moral decision-making by focusing on the use case of kidney allocation. We conducted twenty interviews where participants explained their rationale for their judgments about who should receive a kidney. We observe participants: (a) value patients' morally-relevant attributes to different degrees; (b) use diverse decision-making processes, citing heuristics to reduce decision complexity; (c) can change their opinions; (d) sometimes lack confidence in their decisions (e.g., due to incomplete information); and (e) express enthusiasm and concern regarding AI assisting humans in kidney allocation decisions. Based on these findings, we discuss challenges of computationally modeling moral judgments {as a stand-in for human input}, highlight drawbacks of current approaches, and suggest future directions to address these issues.
LGFeb 17, 2024
Fair Classification with Partial Feedback: An Exploration-Based Data Collection ApproachVijay Keswani, Anay Mehrotra, L. Elisa Celis
In many predictive contexts (e.g., credit lending), true outcomes are only observed for samples that were positively classified in the past. These past observations, in turn, form training datasets for classifiers that make future predictions. However, such training datasets lack information about the outcomes of samples that were (incorrectly) negatively classified in the past and can lead to erroneous classifiers. We present an approach that trains a classifier using available data and comes with a family of exploration strategies to collect outcome data about subpopulations that otherwise would have been ignored. For any exploration strategy, the approach comes with guarantees that (1) all sub-populations are explored, (2) the fraction of false positives is bounded, and (3) the trained classifier converges to a ``desired'' classifier. The right exploration strategy is context-dependent; it can be chosen to improve learning guarantees and encode context-specific group fairness properties. Evaluation on real-world datasets shows that this approach consistently boosts the quality of collected outcome data and improves the fraction of true positives for all groups, with only a small reduction in predictive utility.
LGSep 4, 2025
Towards Cognitively-Faithful Decision-Making Models to Improve AI AlignmentCyrus Cousins, Vijay Keswani, Vincent Conitzer et al.
Recent AI work trends towards incorporating human-centric objectives, with the explicit goal of aligning AI models to personal preferences and societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, which are then used to align AI behavior with that of humans. However, models commonly used in such elicitation processes often do not capture the true cognitive processes of human decision making, such as when people use heuristics to simplify information associated with a decision problem. As a result, models learned from people's decisions often do not align with their cognitive processes, and can not be used to validate the learning framework for generalization to other decision-making tasks. To address this limitation, we take an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons. Building on the vast literature characterizing the cognitive processes that contribute to human decision-making, and recent work characterizing such processes in pairwise comparison tasks, we define a class of models in which individual features are first processed and compared across alternatives, and then the processed features are then aggregated via a fixed rule, such as the Bradley-Terry rule. This structured processing of information ensures such models are realistic and feasible candidates to represent underlying human decision-making processes. We demonstrate the efficacy of this modeling approach in learning interpretable models of human decision making in a kidney allocation task, and show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.
HCMay 31, 2023
Designing Closed-Loop Models for Task AllocationVijay Keswani, L. Elisa Celis, Krishnaram Kenthapadi et al.
Automatically assigning tasks to people is challenging because human performance can vary across tasks for many reasons. This challenge is further compounded in real-life settings in which no oracle exists to assess the quality of human decisions and task assignments made. Instead, we find ourselves in a "closed" decision-making loop in which the same fallible human decisions we rely on in practice must also be used to guide task allocation. How can imperfect and potentially biased human decisions train an accurate allocation model? Our key insight is to exploit weak prior information on human-task similarity to bootstrap model training. We show that the use of such a weak prior can improve task allocation accuracy, even when human decision-makers are fallible and biased. We present both theoretical analysis and empirical evaluation over synthetic data and a social media toxicity detection task. Results demonstrate the efficacy of our approach.
HCFeb 9, 2022
Designing Closed Human-in-the-loop Deferral PipelinesVijay Keswani, Matthew Lease, Krishnaram Kenthapadi
In hybrid human-machine deferral frameworks, a classifier can defer uncertain cases to human decision-makers (who are often themselves fallible). Prior work on simultaneous training of such classifier and deferral models has typically assumed access to an oracle during training to obtain true class labels for training samples, but in practice there often is no such oracle. In contrast, we consider a "closed" decision-making pipeline in which the same fallible human decision-makers used in deferral also provide training labels. How can imperfect and biased human expert labels be used to train a fair and accurate deferral framework? Our key insight is that by exploiting weak prior information, we can match experts to input examples to ensure fairness and accuracy of the resulting deferral framework, even when imperfect and biased experts are used in place of ground truth labels. The efficacy of our approach is shown both by theoretical analysis and by evaluation on two tasks.
CYJul 15, 2021
Auditing for Diversity using Representative ExamplesVijay Keswani, L. Elisa Celis
Assessing the diversity of a dataset of information associated with people is crucial before using such data for downstream applications. For a given dataset, this often involves computing the imbalance or disparity in the empirical marginal distribution of a protected attribute (e.g. gender, dialect, etc.). However, real-world datasets, such as images from Google Search or collections of Twitter posts, often do not have protected attributes labeled. Consequently, to derive disparity measures for such datasets, the elements need to hand-labeled or crowd-annotated, which are expensive processes. We propose a cost-effective approach to approximate the disparity of a given unlabeled dataset, with respect to a protected attribute, using a control set of labeled representative examples. Our proposed algorithm uses the pairwise similarity between elements in the dataset and elements in the control set to effectively bootstrap an approximation to the disparity of the dataset. Importantly, we show that using a control set whose size is much smaller than the size of the dataset is sufficient to achieve a small approximation error. Further, based on our theoretical framework, we also provide an algorithm to construct adaptive control sets that achieve smaller approximation errors than randomly chosen control sets. Simulations on two image datasets and one Twitter dataset demonstrate the efficacy of our approach (using random and adaptive control sets) in auditing the diversity of a wide variety of datasets.
LGFeb 25, 2021
Towards Unbiased and Accurate Deferral to Multiple ExpertsVijay Keswani, Matthew Lease, Krishnaram Kenthapadi
Machine learning models are often implemented in cohort with humans in the pipeline, with the model having an option to defer to a domain expert in cases where it has low confidence in its inference. Our goal is to design mechanisms for ensuring accuracy and fairness in such prediction systems that combine machine learning model inferences and domain expert predictions. Prior work on "deferral systems" in classification settings has focused on the setting of a pipeline with a single expert and aimed to accommodate the inaccuracies and biases of this expert to simultaneously learn an inference model and a deferral system. Our work extends this framework to settings where multiple experts are available, with each expert having their own domain of expertise and biases. We propose a framework that simultaneously learns a classifier and a deferral system, with the deferral system choosing to defer to one or more human experts in cases of input where the classifier has low confidence. We test our framework on a synthetic dataset and a content moderation dataset with biased synthetic experts, and show that it significantly improves the accuracy and fairness of the final predictions, compared to the baselines. We also collect crowdsourced labels for the content moderation task to construct a real-world dataset for the evaluation of hybrid machine-human frameworks and show that our proposed learning framework outperforms baselines on this real-world dataset as well.
CYJul 15, 2020
Dialect Diversity in Text Summarization on TwitterVijay Keswani, L. Elisa Celis
Discussions on Twitter involve participation from different communities with different dialects and it is often necessary to summarize a large number of posts into a representative sample to provide a synopsis. Yet, any such representative sample should sufficiently portray the underlying dialect diversity to present the voices of different participating communities representing the dialects. Extractive summarization algorithms perform the task of constructing subsets that succinctly capture the topic of any given set of posts. However, we observe that there is dialect bias in the summaries generated by common summarization approaches, i.e., they often return summaries that under-represent certain dialects. The vast majority of existing "fair" summarization approaches require socially salient attribute labels (in this case, dialect) to ensure that the generated summary is fair with respect to the socially salient attribute. Nevertheless, in many applications, these labels do not exist. Furthermore, due to the ever-evolving nature of dialects in social media, it is unreasonable to label or accurately infer the dialect of every social media post. To correct for the dialect bias, we employ a framework that takes an existing text summarization algorithm as a blackbox and, using a small set of dialect-diverse sentences, returns a summary that is relatively more dialect-diverse. Crucially, this approach does not need the posts being summarized to have dialect labels, ensuring that the diversification process is independent of dialect classification/identification models. We show the efficacy of our approach on Twitter datasets containing posts written in dialects used by different social groups defined by race or gender; in all cases, our approach leads to improved dialect diversity compared to standard text summarization approaches.
LGJun 22, 2020
A Convergent and Dimension-Independent Min-Max Optimization AlgorithmVijay Keswani, Oren Mangoubi, Sushant Sachdeva et al.
We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. Our equilibrium definition for this framework depends on a proposal distribution which the min-player uses to choose directions in which to update its parameters. We show that, given a smooth and bounded nonconvex-nonconcave objective function, access to any proposal distribution for the min-player's updates, and stochastic gradient oracle for the max-player, our algorithm converges to the aforementioned approximate local equilibrium in a number of iterations that does not depend on the dimension. The equilibrium point found by our algorithm depends on the proposal distribution, and when applying our algorithm to train GANs we choose the proposal distribution to be a distribution of stochastic gradients. We empirically evaluate our algorithm on challenging nonconvex-nonconcave test-functions and loss functions arising in GAN training. Our algorithm converges on these test functions and, when used to train GANs, trains stably on synthetic and real-world datasets and avoids mode collapse
LGJun 8, 2020
Fair Classification with Noisy Protected Attributes: A Framework with Provable GuaranteesL. Elisa Celis, Lingxiao Huang, Vijay Keswani et al.
We present an optimization framework for learning a fair classifier in the presence of noisy perturbations in the protected attributes. Compared to prior work, our framework can be employed with a very general class of linear and linear-fractional fairness constraints, can handle multiple, non-binary protected attributes, and outputs a classifier that comes with provable guarantees on both accuracy and fairness. Empirically, we show that our framework can be used to attain either statistical rate or false positive rate fairness guarantees with a minimal loss in accuracy, even when the noise is large, in two real-world datasets.
LGJun 5, 2019
Data preprocessing to mitigate bias: A maximum entropy based approachL. Elisa Celis, Vijay Keswani, Nisheeth K. Vishnoi
Data containing human or social attributes may over- or under-represent groups with respect to salient social attributes such as gender or race, which can lead to biases in downstream applications. This paper presents an algorithmic framework that can be used as a data preprocessing method towards mitigating such bias. Unlike prior work, it can efficiently learn distributions over large domains, controllably adjust the representation rates of protected groups and achieve target fairness metrics such as statistical parity, yet remains close to the empirical distribution induced by the given dataset. Our approach leverages the principle of maximum entropy - amongst all distributions satisfying a given set of constraints, we should choose the one closest in KL-divergence to a given prior. While maximum entropy distributions can succinctly encode distributions over large domains, they can be difficult to compute. Our main contribution is an instantiation of this framework for our set of constraints and priors, which encode our bias mitigation goals, and that runs in time polynomial in the dimension of the data. Empirically, we observe that samples from the learned distribution have desired representation rates and statistical rates, and when used for training a classifier incurs only a slight loss in accuracy while maintaining fairness properties.
LGJan 29, 2019
Improved Adversarial Learning for Fair ClassificationL. Elisa Celis, Vijay Keswani
Motivated by concerns that machine learning algorithms may introduce significant bias in classification models, developing fair classifiers has become an important problem in machine learning research. One important paradigm towards this has been providing algorithms for adversarially learning fair classifiers (Zhang et al., 2018; Madras et al., 2018). We formulate the adversarial learning problem as a multi-objective optimization problem and find the fair model using gradient descent-ascent algorithm with a modified gradient update step, inspired by the approach of Zhang et al., 2018. We provide theoretical insight and guarantees that formalize the heuristic arguments presented previously towards taking such an approach. We test our approach empirically on the Adult dataset and synthetic datasets and compare against state of the art algorithms (Celis et al., 2018; Zhang et al., 2018; Zafar et al., 2017). The results show that our models and algorithms have comparable or better accuracy than other algorithms while performing better in terms of fairness, as measured using statistical rate or false discovery rate.
LGJan 29, 2019
Implicit Diversity in Image SummarizationL. Elisa Celis, Vijay Keswani
Studies have shown that the people depicted in image search results tend to be of majority groups with respect to socially salient attributes. This skew goes beyond that which already exists in the world - e.g., Kay et al. showed that although 28% of CEOs in US are women, only 10% of the top 100 results for CEO in Google Image Search are women. Most existing approaches to correct for this kind of bias assume that the images of people include socially salient attribute labels. However, such labels are often unknown. Further, using automated techniques to infer these labels may often not be possible within acceptable accuracy ranges, and may not be desirable due to the additional biases this process could incur. We develop a novel approach that takes as input a visibly diverse control set of images and uses this set to select a set of images of people in response to a query. The goal is to have a resulting set that is more visibly diverse in a manner that emulates the diversity depicted in the control set. Importantly, this approach does not require images to be labelled at any point; effectively, it gives a way to implicitly diversify the set of images selected. We provide two variants of our approach: the first is a modification of the MMR algorithm to incorporate the diversity scores, and second is a more efficient variant that does not consider within-list redundancy. We evaluate these approaches empirically on two datasets 1) a new dataset containing top Google image results for 96 occupations, for which we evaluate gender and skin-tone diversity with respect to occupations and 2) the CelebA dataset for which we evaluate gender diversity with respect to facial features. Our approaches produce image sets that significantly improve the visible diversity of the results, compared to current Google search and other diverse image summarization algorithms, at a minimal cost to accuracy.
CYJun 24, 2018
Balanced News Using Constrained Bandit-based PersonalizationSayash Kapoor, Vijay Keswani, Nisheeth K. Vishnoi et al.
We present a prototype for a news search engine that presents balanced viewpoints across liberal and conservative articles with the goal of de-polarizing content and allowing users to escape their filter bubble. The balancing is done according to flexible user-defined constraints, and leverages recent advances in constrained bandit optimization. We showcase our balanced news feed by displaying it side-by-side with the news feed produced by a traditional (polarized) feed.
LGJun 15, 2018
Classification with Fairness Constraints: A Meta-Algorithm with Provable GuaranteesL. Elisa Celis, Lingxiao Huang, Vijay Keswani et al.
Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.
LGFeb 12, 2018
Fair and Diverse DPP-based Data SummarizationL. Elisa Celis, Vijay Keswani, Damian Straszak et al.
Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias (under- or over-representation of a certain gender or race) in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Coming up with efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier and we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our experimental results on a real-world and an image dataset show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case, and we also provide a theoretical explanation of it.