CYJun 21, 2023
Sampling Individually-Fair Rankings that are Always Group FairSruthi Gorantla, Anay Mehrotra, Amit Deshpande et al. · amazon-science
Rankings on online platforms help their end-users find the relevant information -- people, news, media, and products -- quickly. Fair ranking tasks, which ask to rank a set of items to maximize utility subject to satisfying group-fairness constraints, have gained significant interest in the Algorithmic Fairness, Information Retrieval, and Machine Learning literature. Recent works, however, identify uncertainty in the utilities of items as a primary cause of unfairness and propose introducing randomness in the output. This randomness is carefully chosen to guarantee an adequate representation of each item (while accounting for the uncertainty). However, due to this randomness, the output rankings may violate group fairness constraints. We give an efficient algorithm that samples rankings from an individually-fair distribution while ensuring that every output ranking is group fair. The expected utility of the output ranking is at least $α$ times the utility of the optimal fair solution. Here, $α$ depends on the utilities, position-discounts, and constraints -- it approaches 1 as the range of utilities or the position-discounts shrinks, or when utilities satisfy distributional assumptions. Empirically, we observe that our algorithm achieves individual and group fairness and that Pareto dominates the state-of-the-art baselines.
LGAug 25, 2023
Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post FairnessSruthi Gorantla, Eshaan Bhansali, Amit Deshpande et al. · amazon-science
In learning-to-rank (LTR), optimizing only the relevance (or the expected ranking utility) can cause representational harm to certain categories of items. Moreover, if there is implicit bias in the relevance scores, LTR models may fail to optimize for true relevance. Previous works have proposed efficient algorithms to train stochastic ranking models that achieve fairness of exposure to the groups ex-ante (or, in expectation), which may not guarantee representation fairness to the groups ex-post, that is, after realizing a ranking from the stochastic ranking model. Typically, ex-post fairness is achieved by post-processing, but previous work does not train stochastic ranking models that are aware of this post-processing. In this paper, we propose a novel objective that maximizes expected relevance only over those rankings that satisfy given representation constraints to ensure ex-post fairness. Building upon recent work on an efficient sampler for ex-post group-fair rankings, we propose a group-fair Plackett-Luce model and show that it can be efficiently optimized for our objective in the LTR framework. Experiments on three real-world datasets show that our group-fair algorithm guarantees fairness alongside usually having better relevance compared to the LTR baselines. In addition, our algorithm also achieves better relevance than post-processing baselines, which also ensures ex-post fairness. Further, when implicit bias is injected into the training data, our algorithm typically outperforms existing LTR baselines in relevance.
LGAug 22, 2022
Socially Fair Center-based and Linear Subspace ClusteringSruthi Gorantla, Kishen N. Gowda, Amit Deshpande et al. · amazon-science
Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines.
LGMar 2, 2022
Sampling Ex-Post Group-Fair RankingsSruthi Gorantla, Amit Deshpande, Anand Louis · amazon-science
Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution $D$ that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on group-wise representation in the top-$k$ ranks. Our problem formulation works even when there is implicit bias, incomplete relevance information, or only ordinal ranking is available instead of relevance scores or utility values. We propose two algorithms to sample a random group-fair ranking from the distribution $D$ mentioned above. Our first dynamic programming-based algorithm samples ex-post group-fair rankings uniformly at random in time $O(k^2\ell)$, where $\ell$ is the number of groups. Our second random walk-based algorithm samples ex-post group-fair rankings from a distribution $δ$-close to $D$ in total variation distance and has expected running time $O^*(k^2\ell^2)$, when there is a sufficient gap between the given upper and lower bounds on the group-wise representation. The former does exact sampling, but the latter runs significantly faster on real-world data sets for larger values of $k$. We give empirical evidence that our algorithms compare favorably against recent baselines for fairness and ranking utility on real-world data sets.
LGFeb 12, 2023
On Comparing Fair Classifiers under Data BiasMohit Sharma, Amit Deshpande, Rajiv Ratn Shah
In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases, 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few, simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
LGApr 26, 2022
One-pass additive-error subset selection for $\ell_{p}$ subspace approximationAmit Deshpande, Rameshwar Pratap
We consider the problem of subset selection for $\ell_{p}$ subspace approximation, that is, to efficiently find a \emph{small} subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. Previously known subset selection algorithms based on volume sampling and adaptive sampling \cite{DeshpandeV07}, for the general case of $p \in [1, \infty)$, require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for $\ell_{p}$ subspace approximation, for any $p \in [1, \infty)$. Earlier subset selection algorithms that give a one-pass multiplicative $(1+ε)$ approximation work under the special cases. Cohen \textit{et al.} \cite{CohenMM17} gives a one-pass subset section that offers multiplicative $(1+ε)$ approximation guarantee for the special case of $\ell_{2}$ subspace approximation. Mahabadi \textit{et al.} \cite{MahabadiRWZ20} gives a one-pass \emph{noisy} subset selection with $(1+ε)$ approximation guarantee for $\ell_{p}$ subspace approximation when $p \in \{1, 2\}$. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any $p \in [1, \infty)$.
LGJun 19, 2023
Causal Effect Regularization: Automated Detection and Removal of Spurious AttributesAbhinav Kumar, Amit Deshpande, Amit Sharma
In many classification datasets, the task labels are spuriously correlated with some input attributes. Classifiers trained on such datasets often rely on these attributes for prediction, especially when the spurious correlation is high, and thus fail to generalize whenever there is a shift in the attributes' correlation at deployment. If we assume that the spurious attributes are known a priori, several methods have been proposed to learn a classifier that is invariant to the specified attributes. However, in real-world data, information about spurious attributes is typically unavailable. Therefore, we propose a method to automatically identify spurious attributes by estimating their causal effect on the label and then use a regularization objective to mitigate the classifier's reliance on them. Compared to a recent method for identifying spurious attributes, we find that our method is more accurate in removing the attribute from the learned model, especially when spurious correlation is high. Specifically, across synthetic, semi-synthetic, and real-world datasets, our method shows significant improvement in a metric used to quantify the dependence of a classifier on spurious attributes ($Δ$Prob), while obtaining better or similar accuracy. In addition, our method mitigates the reliance on spurious attributes even under noisy estimation of causal effects. To explain the empirical robustness of our method, we create a simple linear classification task with two sets of attributes: causal and spurious. We prove that our method only requires that the ranking of estimated causal effects is correct across attributes to select the correct classifier.
LGNov 11, 2022
Identifying, measuring, and mitigating individual unfairness for supervised learning models and application to credit risk modelsRasoul Shahsavarifar, Jithu Chandran, Mario Inchiosa et al.
In the past few years, Artificial Intelligence (AI) has garnered attention from various industries including financial services (FS). AI has made a positive impact in financial services by enhancing productivity and improving risk management. While AI can offer efficient solutions, it has the potential to bring unintended consequences. One such consequence is the pronounced effect of AI-related unfairness and attendant fairness-related harms. These fairness-related harms could involve differential treatment of individuals; for example, unfairly denying a loan to certain individuals or groups of individuals. In this paper, we focus on identifying and mitigating individual unfairness and leveraging some of the recently published techniques in this domain, especially as applicable to the credit adjudication use case. We also investigate the extent to which techniques for achieving individual fairness are effective at achieving group fairness. Our main contribution in this work is functionalizing a two-step training process which involves learning a fair similarity metric from a group sense using a small portion of the raw data and training an individually "fair" classifier using the rest of the data where the sensitive features are excluded. The key characteristic of this two-step technique is related to its flexibility, i.e., the fair metric obtained in the first step can be used with any other individual fairness algorithms in the second step. Furthermore, we developed a second metric (distinct from the fair similarity metric) to determine how fairly a model is treating similar individuals. We use this metric to compare a "fair" model against its baseline model in terms of their individual fairness value. Finally, some experimental results corresponding to the individual unfairness mitigation techniques are presented.
LGSep 6, 2023
Improved Outlier Robust Seeding for k-meansAmit Deshpande, Rameshwar Pratap
The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$ approximation guarantee \cite{AV2007}. However, in the presence of adversarial noise or outliers, $D^{2}$ sampling is more likely to pick centers from distant outliers instead of inlier clusters, and therefore its approximation guarantees \textit{w.r.t.} $k$-means solution on inliers, does not hold. Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the $D^2$ sampling distribution, which makes it robust to the outliers. Our algorithm runs in $O(ndk)$ time, outputs $O(k)$ clusters, discards marginally more points than the optimal number of outliers, and comes with a provable $O(1)$ approximation guarantee. Our algorithm can also be modified to output exactly $k$ clusters instead of $O(k)$ clusters, while keeping its running time linear in $n$ and $d$. This is an improvement over previous results for robust $k$-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust $k$-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over $k$-means++~\cite{AV2007}, uniform random seeding, greedy sampling for $k$ means~\cite{tkmeanspp}, and robust $k$-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of $k$-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}.
CLFeb 9, 2024Code
NICE: To Optimize In-Context Examples or Not?Pragya Srivastava, Satvik Golechha, Amit Deshpande et al.
Recent work shows that in-context learning and optimization of in-context examples (ICE) can significantly improve the accuracy of large language models (LLMs) on a wide range of tasks, leading to an apparent consensus that ICE optimization is crucial for better performance. However, most of these studies assume a fixed or no instruction provided in the prompt. We challenge this consensus by investigating the necessity of optimizing ICE when task-specific instructions are provided and find that there are many tasks for which it yields diminishing returns. In particular, using a diverse set of tasks and a systematically created instruction set with gradually added details, we find that as the prompt instruction becomes more detailed, the returns on ICE optimization diminish. To characterize this behavior, we introduce a task-specific metric called Normalized Invariability to Choice of Examples (NICE) that quantifies the learnability of tasks from a given instruction, and provides a heuristic to help decide whether to optimize instructions or ICE for a new task. Given a task, the proposed metric can reliably predict the utility of optimizing ICE compared to using random ICE. Our code is available at https://github.com/microsoft/nice-icl.
CLAug 6, 2025Code
Characterizing Deep Research: A Benchmark and Formal DefinitionAbhinav Java, Ashmit Khandelwal, Sukruta Midigeshi et al.
Information tasks such as writing surveys or analytical reports require complex search and reasoning, and have recently been grouped under the umbrella of \textit{deep research} -- a term also adopted by recent models targeting these capabilities. Despite growing interest, the scope of the deep research task remains underdefined and its distinction from other reasoning-intensive problems is poorly understood. In this paper, we propose a formal characterization of the deep research (DR) task and introduce a benchmark to evaluate the performance of DR systems. We argue that the core defining feature of deep research is not the production of lengthy report-style outputs, but rather the high fan-out over concepts required during the search process, i.e., broad and reasoning-intensive exploration. To enable objective evaluation, we define DR using an intermediate output representation that encodes key claims uncovered during search-separating the reasoning challenge from surface-level report generation. Based on this formulation, we propose a diverse, challenging benchmark LiveDRBench with 100 challenging tasks over scientific topics (e.g., datasets, materials discovery, prior art search) and public interest events (e.g., flight incidents, movie awards). Across state-of-the-art DR systems, F1 score ranges between 0.02 and 0.72 for any sub-category. OpenAI's model performs the best with an overall F1 score of 0.55. Analysis of reasoning traces reveals the distribution over the number of referenced sources, branching, and backtracking events executed by current DR systems, motivating future directions for improving their search mechanisms and grounding capabilities. The benchmark is available at https://github.com/microsoft/LiveDRBench.
LGDec 16, 2023Code
Rethinking Robustness of Model AttributionsSandesh Kamath, Sankalp Mittal, Amit Deshpande et al.
For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile and have proposed improvements in either these methods or the model training. We observe two main causes for fragile attributions: first, the existing metrics of robustness (e.g., top-k intersection) over-penalize even reasonable local shifts in attribution, thereby making random perturbations to appear as a strong attack, and second, the attribution can be concentrated in a small region even when there are multiple important parts in an image. To rectify this, we propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions. Towards the role of model training in attributional robustness, we empirically observe that adversarially trained models have more robust attributions on smaller datasets, however, this advantage disappears in larger datasets. Code is available at https://github.com/ksandeshk/LENS.
LGMay 18, 2020Code
Universalization of any adversarial attack using very few test examplesSandesh Kamath, Amit Deshpande, K V Subrahmanyam et al.
Deep learning models are known to be vulnerable not only to input-dependent adversarial attacks but also to input-agnostic or universal adversarial attacks. Dezfooli et al. \cite{Dezfooli17,Dezfooli17anal} construct universal adversarial attack on a given model by looking at a large number of training data points and the geometry of the decision boundary near them. Subsequent work \cite{Khrulkov18} constructs universal attack by looking only at test examples and intermediate layers of the given model. In this paper, we propose a simple universalization technique to take any input-dependent adversarial attack and construct a universal attack by only looking at very few adversarial test examples. We do not require details of the given model and have negligible computational overhead for universalization. We theoretically justify our universalization technique by a spectral property common to many input-dependent adversarial perturbations, e.g., gradients, Fast Gradient Sign Method (FGSM) and DeepFool. Using matrix concentration inequalities and spectral perturbation bounds, we show that the top singular vector of input-dependent adversarial directions on a small test sample gives an effective and simple universal adversarial attack. For VGG16 and VGG19 models trained on ImageNet, our simple universalization of Gradient, FGSM, and DeepFool perturbations using a test sample of 64 images gives fooling rates comparable to state-of-the-art universal attacks \cite{Dezfooli17,Khrulkov18} for reasonable norms of perturbation. Code available at https://github.com/ksandeshk/svd-uap .
LGDec 16, 2023
How Far Can Fairness Constraints Help Recover From Biased Data?Mohit Sharma, Amit Deshpande
A general belief in fair classification is that fairness constraints incur a trade-off with accuracy, which biased data may worsen. Contrary to this belief, Blum & Stangl (2019) show that fair classification with equal opportunity constraints even on extremely biased data can recover optimally accurate and fair classifiers on the original data distribution. Their result is interesting because it demonstrates that fairness constraints can implicitly rectify data bias and simultaneously overcome a perceived fairness-accuracy trade-off. Their data bias model simulates under-representation and label bias in underprivileged population, and they show the above result on a stylized data distribution with i.i.d. label noise, under simple conditions on the data distribution and bias parameters. We propose a general approach to extend the result of Blum & Stangl (2019) to different fairness constraints, data bias models, data distributions, and hypothesis classes. We strengthen their result, and extend it to the case when their stylized distribution has labels with Massart noise instead of i.i.d. noise. We prove a similar recovery result for arbitrary data distributions using fair reject option classifiers. We further generalize it to arbitrary data distributions and arbitrary hypothesis classes, i.e., we prove that for any data distribution, if the optimally accurate classifier in a given hypothesis class is fair and robust, then it can be recovered through fair classification with equal opportunity constraints on the biased distribution whenever the bias parameters satisfy certain simple conditions. Finally, we show applications of our technique to time-varying data bias in classification and fair machine learning pipelines.
LGOct 4, 2025
Cost Efficient Fairness Audit Under Partial FeedbackNirjhar Das, Mohit Sharma, Praharsh Nanavati et al.
We study the problem of auditing the fairness of a given classifier under partial feedback, where true labels are available only for positively classified individuals, (e.g., loan repayment outcomes are observed only for approved applicants). We introduce a novel cost model for acquiring additional labeled data, designed to more accurately reflect real-world costs such as credit assessment, loan processing, and potential defaults. Our goal is to find optimal fairness audit algorithms that are more cost-effective than random exploration and natural baselines. In our work, we consider two audit settings: a black-box model with no assumptions on the data distribution, and a mixture model, where features and true labels follow a mixture of exponential family distributions. In the black-box setting, we propose a near-optimal auditing algorithm under mild assumptions and show that a natural baseline can be strictly suboptimal. In the mixture model setting, we design a novel algorithm that achieves significantly lower audit cost than the black-box case. Our approach leverages prior work on learning from truncated samples and maximum-a-posteriori oracles, and extends known results on spherical Gaussian mixtures to handle exponential family mixtures, which may be of independent interest. Moreover, our algorithms apply to popular fairness metrics including demographic parity, equal opportunity, and equalized odds. Empirically, we demonstrate strong performance of our algorithms on real-world fair classification datasets like Adult Income and Law School, consistently outperforming natural baselines by around 50% in terms of audit cost.
LGJun 5, 2024
On the Power of Randomization in Fair Classification and RepresentationSushant Agarwal, Amit Deshpande
Fair classification and fair representation learning are two important problems in supervised and unsupervised fair machine learning, respectively. Fair classification asks for a classifier that maximizes accuracy on a given data distribution subject to fairness constraints. Fair representation maps a given data distribution over the original feature space to a distribution over a new representation space such that all classifiers over the representation satisfy fairness. In this paper, we examine the power of randomization in both these problems to minimize the loss of accuracy that results when we impose fairness constraints. Previous work on fair classification has characterized the optimal fair classifiers on a given data distribution that maximize accuracy subject to fairness constraints, e.g., Demographic Parity (DP), Equal Opportunity (EO), and Predictive Equality (PE). We refine these characterizations to demonstrate when the optimal randomized fair classifiers can surpass their deterministic counterparts in accuracy. We also show how the optimal randomized fair classifier that we characterize can be obtained as a solution to a convex optimization problem. Recent work has provided techniques to construct fair representations for a given data distribution such that any classifier over this representation satisfies DP. However, the classifiers on these fair representations either come with no or weak accuracy guarantees when compared to the optimal fair classifier on the original data distribution. Extending our ideas for randomized fair classification, we improve on these works, and construct DP-fair, EO-fair, and PE-fair representations that have provably optimal accuracy and suffer no accuracy loss compared to the optimal DP-fair, EO-fair, and PE-fair classifiers respectively on the original data distribution.
LGJun 19, 2021
Learning and Generalization in Overparameterized Normalizing FlowsKulin Shah, Amit Deshpande, Navin Goyal
In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is a one-hidden-layer overparametrized network. Our main contributions are two-fold: (1) On the one hand, we provide theoretical and empirical evidence that for constrained NFs (this class of NFs underlies many NF constructions) with the one-hidden-layer network, overparametrization hurts training. (2) On the other hand, we prove that unconstrained NFs, a recently introduced model, can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized and has one hidden-layer.
LGMay 31, 2021
Rawlsian Fair Adaptation of Deep Learning ClassifiersKulin Shah, Pooja Gupta, Amit Deshpande et al.
Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness.
CGMar 20, 2021
On Subspace Approximation and Subset Selection in Fewer Passes by MCMC SamplingAmit Deshpande, Rameshwar Pratap
We consider the problem of subset selection for $\ell_{p}$ subspace approximation, i.e., given $n$ points in $d$ dimensions, we need to pick a small, representative subset of the given points such that its span gives $(1+ε)$ approximation to the best $k$-dimensional subspace that minimizes the sum of $p$-th powers of distances of all the points to this subspace. Sampling-based subset selection techniques require adaptive sampling iterations with multiple passes over the data. Matrix sketching techniques give a single-pass $(1+ε)$ approximation for $\ell_{p}$ subspace approximation but require additional passes for subset selection. In this work, we propose an MCMC algorithm to reduce the number of passes required by previous subset selection algorithms based on adaptive sampling. For $p=2$, our algorithm gives subset selection of nearly optimal size in only $2$ passes, whereas the number of passes required in previous work depend on $k$. Our algorithm picks a subset of size $\mathrm{poly}(k/ε)$ that gives $(1+ε)$ approximation to the optimal subspace. The running time of the algorithm is $nd + d~\mathrm{poly}(k/ε)$. We extend our results to the case when outliers are present in the datasets, and suggest a two pass algorithm for the same. Our ideas also extend to give a reduction in the number of passes required by adaptive sampling algorithms for $\ell_{p}$ subspace approximation and subset selection, for $p \geq 2$.
LGDec 21, 2020
The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal PerspectiveNaman Goel, Alfonso Amayuelas, Amit Deshpande et al.
Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we use our framework to derive the minimal distributions required to design a fair algorithm. Our proposed algorithm decentralizes the decision-making process and still achieves similar performance to the optimal algorithm that requires centralization and non-recoverable distributions.
IRSep 24, 2020
On the Problem of Underranking in Group-Fair RankingSruthi Gorantla, Amit Deshpande, Anand Louis
Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic inequalities, polarize opinions, and reinforce stereotypes. On the other hand, a bias correction for minority groups can cause more harm if perceived as favoring group-fair outcomes over meritocracy. In this paper, we formulate the problem of underranking in group-fair rankings, which was not addressed in previous work. Most group-fair ranking algorithms post-process a given ranking and output a group-fair ranking. We define underranking based on how close the group-fair rank of each item is to its original rank, and prove a lower bound on the trade-off achievable for simultaneous underranking and group fairness in ranking. We give a fair ranking algorithm that takes any given ranking and outputs another ranking with simultaneous underranking and group fairness guarantees comparable to the lower bound we prove. Our algorithm works with group fairness constraints for any number of groups. Our experimental results confirm the theoretical trade-off between underranking and group fairness, and also show that our algorithm achieves the best of both when compared to the state-of-the-art baselines.
CGJun 30, 2020
Subspace approximation with outliersAmit Deshpande, Rameshwar Pratap
The subspace approximation problem with outliers, for given $n$ points in $d$ dimensions $x_{1},\ldots, x_{n} \in R^{d}$, an integer $1 \leq k \leq d$, and an outlier parameter $0 \leq α\leq 1$, is to find a $k$-dimensional linear subspace of $R^{d}$ that minimizes the sum of squared distances to its nearest $(1-α)n$ points. More generally, the $\ell_{p}$ subspace approximation problem with outliers minimizes the sum of $p$-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the $(1-α)n$ inliers in the optimal solution are promised to lie exactly on a $k$-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal $k$-dimensional subspace summed over the optimal $(1-α)n$ inliers is at least $δ$ times its squared-error summed over all $n$ points, for some $0 < δ\leq 1 - α$. With this assumption, we give an efficient algorithm to find a subset of $poly(k/ε) \log(1/δ) \log\log(1/δ)$ points whose span contains a $k$-dimensional subspace that gives a multiplicative $(1+ε)$-approximation to the optimal solution. The running time of our algorithm is linear in $n$ and $d$. Interestingly, our results hold even when the fraction of outliers $α$ is large, as long as the obvious condition $0 < δ\leq 1 - α$ is satisfied.
CVJun 20, 2020
How do SGD hyperparameters in natural training affect adversarial robustness?Sandesh Kamath, Amit Deshpande, K V Subrahmanyam
Learning rate, batch size and momentum are three important hyperparameters in the SGD algorithm. It is known from the work of Jastrzebski et al. arXiv:1711.04623 that large batch size training of neural networks yields models which do not generalize well. Yao et al. arXiv:1802.08241 observe that large batch training yields models that have poor adversarial robustness. In the same paper, the authors train models with different batch sizes and compute the eigenvalues of the Hessian of loss function. They observe that as the batch size increases, the dominant eigenvalues of the Hessian become larger. They also show that both adversarial training and small-batch training leads to a drop in the dominant eigenvalues of the Hessian or lowering its spectrum. They combine adversarial training and second order information to come up with a new large-batch training algorithm and obtain robust models with good generalization. In this paper, we empirically observe the effect of the SGD hyperparameters on the accuracy and adversarial robustness of networks trained with unperturbed samples. Jastrzebski et al. considered training models with a fixed learning rate to batch size ratio. They observed that higher the ratio, better is the generalization. We observe that networks trained with constant learning rate to batch size ratio, as proposed in Jastrzebski et al., yield models which generalize well and also have almost constant adversarial robustness, independent of the batch size. We observe that momentum is more effective with varying batch sizes and a fixed learning rate than with constant learning rate to batch size ratio based SGD training.
LGJun 8, 2020
On Universalized Adversarial and Invariant PerturbationsSandesh Kamath, Amit Deshpande, K V Subrahmanyam
Convolutional neural networks or standard CNNs (StdCNNs) are translation-equivariant models that achieve translation invariance when trained on data augmented with sufficient translations. Recent work on equivariant models for a given group of transformations (e.g., rotations) has lead to group-equivariant convolutional neural networks (GCNNs). GCNNs trained on data augmented with sufficient rotations achieve rotation invariance. Recent work by authors arXiv:2002.11318 studies a trade-off between invariance and robustness to adversarial attacks. In another related work arXiv:2005.08632, given any model and any input-dependent attack that satisfies a certain spectral property, the authors propose a universalization technique called SVD-Universal to produce a universal adversarial perturbation by looking at very few test examples. In this paper, we study the effectiveness of SVD-Universal on GCNNs as they gain rotation invariance through higher degree of training augmentation. We empirically observe that as GCNNs gain rotation invariance through training augmented with larger rotations, the fooling rate of SVD-Universal gets better. To understand this phenomenon, we introduce universal invariant directions and study their relation to the universal adversarial direction produced by SVD-Universal.
LGFeb 26, 2020
Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural NetworksSandesh Kamath, Amit Deshpande, K V Subrahmanyam et al.
(Non-)robustness of neural networks to small, adversarial pixel-wise perturbations, and as more recently shown, to even random spatial transformations (e.g., translations, rotations) entreats both theoretical and empirical understanding. Spatial robustness to random translations and rotations is commonly attained via equivariant models (e.g., StdCNNs, GCNNs) and training augmentation, whereas adversarial robustness is typically achieved by adversarial training. In this paper, we prove a quantitative trade-off between spatial and adversarial robustness in a simple statistical setting. We complement this empirically by showing that: (a) as the spatial robustness of equivariant models improves by training augmentation with progressively larger transformations, their adversarial robustness worsens progressively, and (b) as the state-of-the-art robust models are adversarially trained with progressively larger pixel-wise perturbations, their spatial robustness drops progressively. Towards achieving pareto-optimality in this trade-off, we propose a method based on curriculum learning that trains gradually on more difficult perturbations (both spatial and adversarial) to improve spatial and adversarial robustness simultaneously.
AISep 3, 2019
Quantifying Infra-Marginality and Its Trade-off with Group FairnessArpita Biswas, Siddharth Barman, Amit Deshpande et al.
In critical decision-making scenarios, optimizing accuracy can lead to a biased classifier, hence past work recommends enforcing group-based fairness metrics in addition to maximizing accuracy. However, doing so exposes the classifier to another kind of bias called infra-marginality. This refers to individual-level bias where some individuals/subgroups can be worse off than under simply optimizing for accuracy. For instance, a classifier implementing race-based parity may significantly disadvantage women of the advantaged race. To quantify this bias, we propose a general notion of $η$-infra-marginality that can be used to evaluate the extent of this bias. We prove theoretically that, unlike other fairness metrics, infra-marginality does not have a trade-off with accuracy: high accuracy directly leads to low infra-marginality. This observation is confirmed through empirical analysis on multiple simulated and real-world datasets. Further, we find that maximizing group fairness often increases infra-marginality, suggesting the consideration of both group-level fairness and individual-level infra-marginality. However, measuring infra-marginality requires knowledge of the true distribution of individual-level outcomes correctly and explicitly. We propose a practical method to measure infra-marginality, and a simple algorithm to maximize group-wise accuracy and avoid infra-marginality.
DSApr 28, 2018
On Euclidean $k$-Means Clustering with $α$-Center ProximityAmit Deshpande, Anand Louis, Apoorv Vikram Singh
$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $α> 1$. We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $α$-center proximity. We give a simple algorithm to find the optimal $α$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(α- 1)$ but linear in the number of points and the dimension. We define an analogous $α$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $α$-center proximity. On the hardness side we show that for any $α' > 1$, there exists an $α\leq α'$, $(α>1)$, and an $\varepsilon_0 > 0$ such that minimizing the $k$-means objective over clusterings that satisfy $α$-center proximity is NP-hard to approximate within a multiplicative $(1+\varepsilon_0)$ factor.
LGFeb 12, 2018
Fair and Diverse DPP-based Data SummarizationL. Elisa Celis, Vijay Keswani, Damian Straszak et al.
Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias (under- or over-representation of a certain gender or race) in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Coming up with efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier and we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our experimental results on a real-world and an image dataset show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case, and we also provide a theoretical explanation of it.
LGNov 13, 2016
Batched Gaussian Process Bandit Optimization via Determinantal Point ProcessesTarun Kathuria, Amit Deshpande, Pushmeet Kohli
Gaussian Process bandit optimization has emerged as a powerful tool for optimizing noisy black box functions. One example in machine learning is hyper-parameter optimization where each evaluation of the target function requires training a model which may involve days or even weeks of computation. Most methods for this so-called "Bayesian optimization" only allow sequential exploration of the parameter space. However, it is often desirable to propose batches or sets of parameter values to explore simultaneously, especially when there are large parallel processing facilities at our disposal. Batch methods require modeling the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. In this paper, we propose a new approach for parallelizing Bayesian optimization by modeling the diversity of a batch via Determinantal point processes (DPPs) whose kernels are learned automatically. This allows us to generalize a previous result as well as prove better regret bounds based on DPP sampling. Our experiments on a variety of synthetic and real-world robotics and hyper-parameter optimization tasks indicate that our DPP-based methods, especially those based on DPP sampling, outperform state-of-the-art methods.
LGOct 23, 2016
How to be Fair and Diverse?L. Elisa Celis, Amit Deshpande, Tarun Kathuria et al.
Due to the recent cases of algorithmic bias in data-driven decision-making, machine learning methods are being put under the microscope in order to understand the root cause of these biases and how to correct them. Here, we consider a basic algorithmic task that is central in machine learning: subsampling from a large data set. Subsamples are used both as an end-goal in data summarization (where fairness could either be a legal, political or moral requirement) and to train algorithms (where biases in the samples are often a source of bias in the resulting model). Consequently, there is a growing effort to modify either the subsampling methods or the algorithms themselves in order to ensure fairness. However, in doing so, a question that seems to be overlooked is whether it is possible to produce fair subsamples that are also adequately representative of the feature space of the data set - an important and classic requirement in machine learning. Can diversity and fairness be simultaneously ensured? We start by noting that, in some applications, guaranteeing one does not necessarily guarantee the other, and a new approach is required. Subsequently, we present an algorithmic framework which allows us to produce both fair and diverse samples. Our experimental results on an image summarization task show marked improvements in fairness without compromising feature diversity by much, giving us the best of both the worlds.
DSAug 1, 2016
On the Complexity of Constrained Determinantal Point ProcessesL. Elisa Celis, Amit Deshpande, Tarun Kathuria et al.
Determinantal Point Processes (DPPs) are probabilistic models that arise in quantum physics and random matrix theory and have recently found numerous applications in computer science. DPPs define distributions over subsets of a given ground set, they exhibit interesting properties such as negative correlation, and, unlike other models, have efficient algorithms for sampling. When applied to kernel methods in machine learning, DPPs favor subsets of the given data with more diverse features. However, many real-world applications require efficient algorithms to sample from DPPs with additional constraints on the subset, e.g., partition or matroid constraints that are important to ensure priors, resource or fairness constraints on the sampled subset. Whether one can efficiently sample from DPPs in such constrained settings is an important problem that was first raised in a survey of DPPs by \cite{KuleszaTaskar12} and studied in some recent works in the machine learning literature. The main contribution of our paper is the first resolution of the complexity of sampling from DPPs with constraints. We give exact efficient algorithms for sampling from constrained DPPs when their description is in unary. Furthermore, we prove that when the constraints are specified in binary, this problem is #P-hard via a reduction from the problem of computing mixed discriminants implying that it may be unlikely that there is an FPRAS. Our results benefit from viewing the constrained sampling problem via the lens of polynomials. Consequently, we obtain a few algorithms of independent interest: 1) to count over the base polytope of regular matroids when there are additional (succinct) budget constraints and, 2) to evaluate and compute the mixed characteristic polynomials, that played a central role in the resolution of the Kadison-Singer problem, for certain special cases.
DSJul 6, 2016
On Sampling and Greedy MAP Inference of Constrained Determinantal Point ProcessesTarun Kathuria, Amit Deshpande
Subset selection problems ask for a small, diverse yet representative subset of the given data. When pairwise similarities are captured by a kernel, the determinants of submatrices provide a measure of diversity or independence of items within a subset. Matroid theory gives another notion of independence, thus giving rise to optimization and sampling questions about Determinantal Point Processes (DPPs) under matroid constraints. Partition constraints, as a special case, arise naturally when incorporating additional labeling or clustering information, besides the kernel, in DPPs. Finding the maximum determinant submatrix under matroid constraints on its row/column indices has been previously studied. However, the corresponding question of sampling from DPPs under matroid constraints has been unresolved, beyond the simple cardinality constrained k-DPPs. We give the first polynomial time algorithm to sample exactly from DPPs under partition constraints, for any constant number of partitions. We complement this by a complexity theoretic barrier that rules out such a result under general matroid constraints. Our experiments indicate that partition-constrained DPPs offer more flexibility and more diversity than k-DPPs and their naive extensions, while being reasonably efficient in running time. We also show that a simple greedy initialization followed by local search gives improved approximation guarantees for the problem of MAP inference from k- DPPs on well-conditioned kernels. Our experiments show that this improvement is significant for larger values of k, supporting our theoretical result.