ITMay 28
Local Differential Privacy with Correlated Noise Achieves Central-DP Optimal CostMadhura Pathegama, Srikanth Avasarala, Viveck R. Cadambe et al.
We study privately estimating the sum of $n$ user-held values in the presence of an honest-but-curious server. This motivates requiring privacy not only at data release but also throughout server-side computation. We therefore adopt the local (pure) differential privacy model, in which each user transmits a noise-perturbed value. It is well known that independent local noise typically incurs a substantial utility loss compared to the centralized model, where noise is added only after aggregation. We show that this gap is not fundamental. By carefully designing correlations among the locally added noise variables, we construct $\varepsilon$-DP mechanisms whose estimation cost matches the optimal cost achievable in the centralized setting, up to an arbitrarily small error.
CYMay 26
Fixed Points and Stochastic Meritocracies: A Long-Term PerspectiveGaurab Pokharel, Diptangshu Sen, Sanmay Das et al.
We study group fairness in the context of feedback loops induced by meritocratic selection into programs that themselves confer additional advantage, like college admissions. We introduce a stylized, yet novel inter-generational model for the setting and analyze it in situations where there are no underlying differences between two populations. When the benefit of the program (or the harm of not getting into it) is completely symmetric, we show that disparities between the two populations will vanish on average in the long term, although in the short term disparities will continue to arise and dissipate cyclically. Further, the time an accumulated advantage takes to dissipate can be significant, and increases as a function of the relative importance of the program in conveying benefits. Interestingly, significant disparities can arise purely due to randomness even from completely symmetric initial conditions, especially when populations are small. The introduction of even a slight asymmetry, where the group that has accumulated an advantage becomes slightly preferred, leads to a completely different outcome. In these instances, starting from completely symmetric initial conditions, disparities between groups arise stochastically and then persist over time, yielding a permanent advantage for one group. Our analysis precisely characterizes conditions under which disparities persist or diminish, with a particular focus on the role of the scarcity of available spots in the program and its effectiveness. We also present extensive simulations in a richer model that further support our theoretical results in the simpler, stylized model. Our findings are relevant for the design and implementation of algorithmic fairness interventions in similar selection processes.
GTMay 18
Data Sharing with Endogenous Choices over Differential Privacy LevelsRaef Bassily, Kate Donahue, Diptangshu Sen et al.
Motivated by the rapid push to decentralize sharing of data, we study whether large-scale data sharing coalitions can form in a decentralized manner under differential privacy when players have heterogeneous privacy preferences. We first consider a fully decentralized data-sharing mechanism in which each player decides whether to participate and how much privacy noise to add locally to their sensitive data before sharing. Privacy choices induce a fundamental trade-off: higher privacy lowers individual privacy costs but reduces data utility and statistical accuracy for the coalition. These choices generate externalities across players, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how fully decentralized data-sharing compares to a centralized, socially optimal benchmark when the number of players is large. We provide a comprehensive analysis across multiple privacy-cost regimes corresponding to different attack/observation models in differential privacy, showing that full decentralization is highly inefficient in terms of both social welfare and estimator accuracy. Surprisingly, we find that a simple partially decentralized mechanism (where players still retain participation agency, but a central designer chooses a fixed privacy noise level for everyone) closes this efficiency gap down to constant factors across all privacy-cost regimes.
LGJun 20, 2023
Randomized Quantization is All You Need for Differential Privacy in Federated LearningYeojoon Youn, Zihao Hu, Juba Ziani et al.
Federated learning (FL) is a common and practical framework for learning a machine model in a decentralized fashion. A primary motivation behind this decentralized approach is data privacy, ensuring that the learner never sees the data of each local source itself. Federated learning then comes with two majors challenges: one is handling potentially complex model updates between a server and a large number of data sources; the other is that de-centralization may, in fact, be insufficient for privacy, as the local updates themselves can reveal information about the sources' data. To address these issues, we consider an approach to federated learning that combines quantization and differential privacy. Absent privacy, Federated Learning often relies on quantization to reduce communication complexity. We build upon this approach and develop a new algorithm called the \textbf{R}andomized \textbf{Q}uantization \textbf{M}echanism (RQM), which obtains privacy through a two-levels of randomization. More precisely, we randomly sub-sample feasible quantization levels, then employ a randomized rounding procedure using these sub-sampled discrete levels. We are able to establish that our results preserve ``Renyi differential privacy'' (Renyi DP). We empirically study the performance of our algorithm and demonstrate that compared to previous work it yields improved privacy-accuracy trade-offs for DP federated learning. To the best of our knowledge, this is the first study that solely relies on randomized quantization without incorporating explicit discrete noise to achieve Renyi DP guarantees in Federated Learning systems.
GTMay 25
The Impact of Competition on Outcomes of Score-Based College AdmissionsGeorge Bentley, Diptangshu Sen, Juba Ziani
We study how the design of admissions policies affects the ability of students admitted to universities. In our model, applicants have a multi-dimensional ability, which is a combination of a "type" and a "soft skill." Universities may differ in how they evaluate quality and have differing preferences on type and soft skills. Then, university admissions rely on a single noisy aggregate signal, such as a test score, that may not fully align with the university's preferences, and a university evaluates applicants through the posterior expectations of their preference metric given the observed signal. Our main results highlight that the design of good admission policies can be counter-intuitive. Under a single university, when holding the number of qualified applicants constant, increasing the usefulness of the signal (by aligning it more closely with the university preferences) leads to a worse type and soft skill for admitted students. Further, a university cannot affect the composition of students that are strong on type versus soft skills by changing their preferences. The picture becomes even more complicated under competition between as few as two universities: self-selection effects among students admitted to both universities can lead to part of the applicant pool switching which university they prefer, even under small changes in the design of the noisy signal. This can, in particular, lead to sudden and non-monotonic loss in the quality of admitted students when changing the alignment between signal and university preferences. Further, a university can get more students by increasing their selectivity. Finally, when admissions rely on separate noisy scores for type and for soft skills, we show that universities that put more emphasis on type (respectively soft skills) end up, counter-intuitively, admitting students with higher soft skills (respectively type).
GTSep 6, 2024
Algorithmic Collusion Without ThreatsEshwar Ram Arunachaleswaran, Natalie Collina, Sampath Kannan et al.
There has been substantial recent concern that pricing algorithms might learn to ``collude.'' Supra-competitive prices can emerge as a Nash equilibrium of repeated pricing games, in which sellers play strategies which threaten to punish their competitors who refuse to support high prices, and these strategies can be automatically learned. In fact, a standard economic intuition is that supra-competitive prices emerge from either the use of threats, or a failure of one party to optimize their payoff. Is this intuition correct? Would preventing threats in algorithmic decision-making prevent supra-competitive prices when sellers are optimizing for their own revenue? No. We show that supra-competitive prices can emerge even when both players are using algorithms which do not encode threats, and which optimize for their own revenue. We study sequential pricing games in which a first mover deploys an algorithm and then a second mover optimizes within the resulting environment. We show that if the first mover deploys any algorithm with a no-regret guarantee, and then the second mover even approximately optimizes within this now static environment, monopoly-like prices arise. The result holds for any no-regret learning algorithm deployed by the first mover and for any pricing policy of the second mover that obtains them profit at least as high as a random pricing would -- and hence the result applies even when the second mover is optimizing only within a space of non-responsive pricing distributions which are incapable of encoding threats. In fact, there exists a set of strategies, neither of which explicitly encode threats that form a Nash equilibrium of the simultaneous pricing game in algorithm space, and lead to near monopoly prices. This suggests that the definition of ``algorithmic collusion'' may need to be expanded, to include strategies without explicitly encoded threats.
LGJan 31, 2023
Sequential Strategic ScreeningLee Cohen, Saeed Sharifi-Malvajerdi, Kevin Stangl et al.
We initiate the study of strategic behavior in screening processes with multiple classifiers. We focus on two contrasting settings: a conjunctive setting in which an individual must satisfy all classifiers simultaneously, and a sequential setting in which an individual to succeed must satisfy classifiers one at a time. In other words, we introduce the combination of strategic classification with screening processes. We show that sequential screening pipelines exhibit new and surprising behavior where individuals can exploit the sequential ordering of the tests to zig-zag between classifiers without having to simultaneously satisfy all of them. We demonstrate an individual can obtain a positive outcome using a limited manipulation budget even when far from the intersection of the positive regions of every classifier. Finally, we consider a learner whose goal is to design a sequential screening process that is robust to such manipulations, and provide a construction for the learner that optimizes a natural objective.
CRAug 16, 2024
Fairness Issues and Mitigations in (Differentially Private) Socio-Demographic Data ProcessesJoonhyuk Ko, Juba Ziani, Saswat Das et al.
Statistical agencies rely on sampling techniques to collect socio-demographic data crucial for policy-making and resource allocation. This paper shows that surveys of important societal relevance introduce sampling errors that unevenly impact group-level estimates, thereby compromising fairness in downstream decisions. To address these issues, this paper introduces an optimization approach modeled on real-world survey design processes, ensuring sampling costs are optimized while maintaining error margins within prescribed tolerances. Additionally, privacy-preserving methods used to determine sampling rates can further impact these fairness issues. This paper explores the impact of differential privacy on the statistics informing the sampling process, revealing a surprising effect: not only is the expected negative effect from the addition of noise for differential privacy negligible, but also this privacy noise can in fact reduce unfairness as it positively biases smaller counts. These findings are validated over an extensive analysis using datasets commonly applied in census statistics.
LGOct 7, 2023
Oracle Efficient Algorithms for Groupwise RegretKrishna Acharya, Eshwar Ram Arunachaleswaran, Sampath Kannan et al.
We study the problem of online prediction, in which at each time step $t$, an individual $x_t$ arrives, whose label we must predict. Each individual is associated with various groups, defined based on their features such as age, sex, race etc., which may intersect. Our goal is to make predictions that have regret guarantees not just overall but also simultaneously on each sub-sequence comprised of the members of any single group. Previous work such as [Blum & Lykouris] and [Lee et al] provide attractive regret guarantees for these problems; however, these are computationally intractable on large model classes. We show that a simple modification of the sleeping experts technique of [Blum & Lykouris] yields an efficient reduction to the well-understood problem of obtaining diminishing external regret absent group considerations. Our approach gives similar regret guarantees compared to [Blum & Lykouris]; however, we run in time linear in the number of groups, and are oracle-efficient in the hypothesis class. This in particular implies that our algorithm is efficient whenever the number of groups is polynomially bounded and the external-regret problem can be solved efficiently, an improvement on [Blum & Lykouris]'s stronger condition that the model class must be small. Our approach can handle online linear regression and online combinatorial optimization problems like online shortest paths. Beyond providing theoretical regret bounds, we evaluate this algorithm with an extensive set of experiments on synthetic data and on two real data sets -- Medical costs and the Adult income dataset, both instantiated with intersecting groups defined in terms of race, sex, and other demographic characteristics. We find that uniformly across groups, our algorithm gives substantial error improvements compared to running a standard online linear regression algorithm with no groupwise regret guarantees.
CRAug 8, 2024
Differentially Private Data Release on Graphs: Inefficiencies and UnfairnessFerdinando Fioretto, Diptangshu Sen, Juba Ziani
Networks are crucial components of many sectors, including telecommunications, healthcare, finance, energy, and transportation.The information carried in such networks often contains sensitive user data, like location data for commuters and packet data for online users. Therefore, when considering data release for networks, one must ensure that data release mechanisms do not leak information about individuals, quantified in a precise mathematical sense. Differential Privacy (DP) is the widely accepted, formal, state-of-the-art technique, which has found use in a variety of real-life settings including the 2020 U.S. Census, Apple users' device data, or Google's location data. Yet, the use of DP comes with new challenges, as the noise added for privacy introduces inaccuracies or biases and further, DP techniques can also distribute these biases disproportionately across different populations, inducing fairness issues. The goal of this paper is to characterize the impact of DP on bias and unfairness in the context of releasing information about networks, taking a departure from previous work which has studied these effects in the context of private population counts release (such as in the U.S. Census). To this end, we consider a network release problem where the network structure is known to all, but the weights on edges must be released privately. We consider the impact of this private release on a simple downstream decision-making task run by a third-party, which is to find the shortest path between any two pairs of nodes and recommend the best route to users. This setting is of highly practical relevance, mirroring scenarios in transportation networks, where preserving privacy while providing accurate routing information is crucial. Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
CRMay 13
Limits of Personalizing Differential Privacy BudgetsEdwige Cyffers, Juba Ziani
A key technical difficulty in differential privacy is selecting a privacy budget that satisfies privacy requirements while maximizing utility. A natural and well-studied workaround is to use personalized privacy budgets, which may differ across agents. In this paper, we show that personalized budgets come with major limitations and that for mean estimation, the dominant factor is not full personalization, but rather choosing the right effective privacy budget. This can be achieved through a simple thresholding operator that we describe. Compared with this thresholding baseline, the gains obtained by fully personalized mechanisms are limited. In particular, we precisely quantify the constant-factor improvement in settings with mixed private and public datasets and in private datasets with two levels of privacy requirements. We also establish upper bounds and identify regimes of maximal gain for arbitrary privacy requirements.
LGFeb 13, 2024
Bayesian Strategic ClassificationLee Cohen, Saeed Sharifi-Malvajerdi, Kevin Stangl et al.
In strategic classification, agents modify their features, at a cost, to ideally obtain a positive classification from the learner's classifier. The typical response of the learner is to carefully modify their classifier to be robust to such strategic behavior. When reasoning about agent manipulations, most papers that study strategic classification rely on the following strong assumption: agents fully know the exact parameters of the deployed classifier by the learner. This often is an unrealistic assumption when using complex or proprietary machine learning techniques in real-world prediction tasks. We initiate the study of partial information release by the learner in strategic classification. We move away from the traditional assumption that agents have full knowledge of the classifier. Instead, we consider agents that have a common distributional prior on which classifier the learner is using. The learner in our model can reveal truthful, yet not necessarily complete, information about the deployed classifier to the agents. The learner's goal is to release just enough information about the classifier to maximize accuracy. We show how such partial information release can, counter-intuitively, benefit the learner's accuracy, despite increasing agents' abilities to manipulate. We show that while it is intractable to compute the best response of an agent in the general case, there exist oracle-efficient algorithms that can solve the best response of the agents when the learner's hypothesis class is the class of linear classifiers, or when the agents' cost function satisfies a natural notion of submodularity as we define. We then turn our attention to the learner's optimization problem and provide both positive and negative results on the algorithmic problem of how much information the learner should release about the classifier to maximize their expected accuracy.
LGJan 30, 2024
Personalized Differential Privacy for Ridge RegressionKrishna Acharya, Franziska Boenisch, Rakshit Naidu et al.
The increased application of machine learning (ML) in sensitive domains requires protecting the training data through privacy frameworks, such as differential privacy (DP). DP requires to specify a uniform privacy level $\varepsilon$ that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel Personalized-DP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015).
GTFeb 10, 2025
Incentivizing Desirable Effort Profiles in Strategic Classification: The Role of Causality and UncertaintyValia Efthymiou, Chara Podimata, Diptangshu Sen et al. · harvard
We study strategic classification in binary decision-making settings where agents can modify their features in order to improve their classification outcomes. Importantly, our work considers the causal structure across different features, acknowledging that effort in a given feature may affect other features. The main goal of our work is to understand \emph{when and how much agent effort is invested towards desirable features}, and how this is influenced by the deployed classifier, the causal structure of the agent's features, their ability to modify them, and the information available to the agent about the classifier and the feature causal graph. In the complete information case, when agents know the classifier and the causal structure of the problem, we derive conditions ensuring that rational agents focus on features favored by the principal. We show that designing classifiers to induce desirable behavior is generally non-convex, though tractable in special cases. We also extend our analysis to settings where agents have incomplete information about the classifier or the causal graph. While optimal effort selection is again a non-convex problem under general uncertainty, we highlight special cases of partial uncertainty where this selection problem becomes tractable. Our results indicate that uncertainty drives agents to favor features with higher expected importance and lower variance, potentially misaligning with principal preferences. Finally, numerical experiments based on a cardiovascular disease risk study illustrate how to incentivize desirable modifications under uncertainty.
GTMay 31, 2025
The Disparate Effects of Partial Information in Bayesian Strategic LearningSrikanth Avasarala, Serena Wang, Juba Ziani
We study how partial information about scoring rules affects fairness in strategic learning settings. In strategic learning, a learner deploys a scoring rule, and agents respond strategically by modifying their features -- at some cost -- to improve their outcomes. However, in our work, agents do not observe the scoring rule directly; instead, they receive a noisy signal of said rule. We consider two different agent models: (i) naive agents, who take the noisy signal at face value, and (ii) Bayesian agents, who update a prior belief based on the signal. Our goal is to understand how disparities in outcomes arise between groups that differ in their costs of feature modification, and how these disparities vary with the level of transparency of the learner's rule. For naive agents, we show that utility disparities can grow unboundedly with noise, and that the group with lower costs can, perhaps counter-intuitively, be disproportionately harmed under limited transparency. In contrast, for Bayesian agents, disparities remain bounded. We provide a full characterization of disparities across groups as a function of the level of transparency and show that they can vary non-monotonically with noise; in particular, disparities are often minimized at intermediate levels of transparency. Finally, we extend our analysis to settings where groups differ not only in cost, but also in prior beliefs, and study how this asymmetry influences fairness.
LGMay 23, 2025
KL-regularization Itself is Differentially Private in Bandits and RLHFYizhou Zhang, Kishan Panaganti, Laixi Shi et al.
Differential Privacy (DP) provides a rigorous framework for privacy, ensuring the outputs of data-driven algorithms remain statistically indistinguishable across datasets that differ in a single entry. While guaranteeing DP generally requires explicitly injecting noise either to the algorithm itself or to its outputs, the intrinsic randomness of existing algorithms presents an opportunity to achieve DP ``for free''. In this work, we explore the role of regularization in achieving DP across three different decision-making problems: multi-armed bandits, linear contextual bandits, and reinforcement learning from human feedback (RLHF), in offline data settings. We show that adding KL-regularization to the learning objective (a common approach in optimization algorithms) makes the action sampled from the resulting stochastic policy itself differentially private. This offers a new route to privacy guarantees without additional noise injection, while also preserving the inherent advantage of regularization in enhancing performance.
CRNov 7, 2024
Differential Privacy Overview and Fundamental TechniquesFerdinando Fioretto, Pascal Van Hentenryck, Juba Ziani
This chapter is meant to be part of the book "Differential Privacy in Artificial Intelligence: From Theory to Practice" and provides an introduction to Differential Privacy. It starts by illustrating various attempts to protect data privacy, emphasizing where and why they failed, and providing the key desiderata of a robust privacy definition. It then defines the key actors, tasks, and scopes that make up the domain of privacy-preserving data analysis. Following that, it formalizes the definition of Differential Privacy and its inherent properties, including composition, post-processing immunity, and group privacy. The chapter also reviews the basic techniques and mechanisms commonly used to implement Differential Privacy in its pure and approximate forms.
GTFeb 12, 2025
Last-iterate Convergence for Symmetric, General-sum, $2 \times 2$ Games Under The Exponential Weights DynamicGuanghui Wang, Krishna Acharya, Lokranjan Lakshmikanthan et al.
We conduct a comprehensive analysis of the discrete-time exponential-weights dynamic with a constant step size on all \emph{general-sum and symmetric} $2 \times 2$ normal-form games, i.e. games with $2$ pure strategies per player, and where the ensuing payoff tuple is of the form $(A,A^\top)$ (where $A$ is the $2 \times 2$ payoff matrix corresponding to the first player). Such symmetric games commonly arise in real-world interactions between "symmetric" agents who have identically defined utility functions -- such as Bertrand competition, multi-agent performative prediction, and certain congestion games -- and display a rich multiplicity of equilibria despite the seemingly simple setting. Somewhat surprisingly, we show through a first-principles analysis that the exponential weights dynamic, which is popular in online learning, converges in the last iterate for such games regardless of initialization with an appropriately chosen step size. For certain games and/or initializations, we further show that the convergence rate is in fact exponential and holds for any step size. We illustrate our theory with extensive simulations and applications to the aforementioned game-theoretic interactions. In the case of multi-agent performative prediction, we formulate a new "mortgage competition" game between lenders (i.e. banks) who interact with a population of customers, and show that it fits into our framework.
GTMar 1, 2021
Information Discrepancy in Strategic LearningYahav Bechavod, Chara Podimata, Zhiwei Steven Wu et al.
We initiate the study of the effects of non-transparency in decision rules on individuals' ability to improve in strategic learning settings. Inspired by real-life settings, such as loan approvals and college admissions, we remove the assumption typically made in the strategic learning literature, that the decision rule is fully known to individuals, and focus instead on settings where it is inaccessible. In their lack of knowledge, individuals try to infer this rule by learning from their peers (e.g., friends and acquaintances who previously applied for a loan), naturally forming groups in the population, each with possibly different type and level of information regarding the decision rule. We show that, in equilibrium, the principal's decision rule optimizing welfare across sub-populations may cause a strong negative externality: the true quality of some of the groups can actually deteriorate. On the positive side, we show that, in many natural cases, optimal improvement can be guaranteed simultaneously for all sub-populations. We further introduce a measure we term information overlap proxy, and demonstrate its usefulness in characterizing the disparity in improvements across sub-populations. Finally, we identify a natural condition under which improvement can be guaranteed for all sub-populations while maintaining high predictive accuracy. We complement our theoretical analysis with experiments on real-world datasets.
LGJun 12, 2020
Algorithms and Learning for Fair Portfolio DesignEmily Diana, Travis Dick, Hadi Elzayn et al.
We consider a variation on the classical finance problem of optimal portfolio design. In our setting, a large population of consumers is drawn from some distribution over risk tolerances, and each consumer must be assigned to a portfolio of lower risk than her tolerance. The consumers may also belong to underlying groups (for instance, of demographic properties or wealth), and the goal is to design a small number of portfolios that are fair across groups in a particular and natural technical sense. Our main results are algorithms for optimal and near-optimal portfolio design for both social welfare and fairness objectives, both with and without assumptions on the underlying group structure. We describe an efficient algorithm based on an internal two-player zero-sum game that learns near-optimal fair portfolios ex ante and show experimentally that it can be used to obtain a small set of fair portfolios ex post as well. For the special but natural case in which group structure coincides with risk tolerances (which models the reality that wealthy consumers generally tolerate greater risk), we give an efficient and optimal fair algorithm. We also provide generalization guarantees for the underlying risk distribution that has no dependence on the number of portfolios and illustrate the theory with simulation results.
LGFeb 17, 2020
Gaming Helps! Learning from Strategic Interactions in Natural DynamicsYahav Bechavod, Katrina Ligett, Zhiwei Steven Wu et al.
We consider an online regression setting in which individuals adapt to the regression model: arriving individuals are aware of the current model, and invest strategically in modifying their own features so as to improve the predicted score that the current model assigns to them. Such feature manipulation has been observed in various scenarios -- from credit assessment to school admissions -- posing a challenge for the learner. Surprisingly, we find that such strategic manipulations may in fact help the learner recover the meaningful variables -- that is, the features that, when changed, affect the true label (as opposed to non-meaningful features that have no effect). We show that even simple behavior on the learner's part allows her to simultaneously i) accurately recover the meaningful features, and ii) incentivize agents to invest in these meaningful features, providing incentives for improvement.
GTFeb 16, 2020
Pipeline InterventionsEshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth et al.
We introduce the \emph{pipeline intervention} problem, defined by a layered directed acyclic graph and a set of stochastic matrices governing transitions between successive layers. The graph is a stylized model for how people from different populations are presented opportunities, eventually leading to some reward. In our model, individuals are born into an initial position (i.e. some node in the first layer of the graph) according to a fixed probability distribution, and then stochastically progress through the graph according to the transition matrices, until they reach a node in the final layer of the graph; each node in the final layer has a \emph{reward} associated with it. The pipeline intervention problem asks how to best make costly changes to the transition matrices governing people's stochastic transitions through the graph, subject to a budget constraint. We consider two objectives: social welfare maximization, and a fairness-motivated maximin objective that seeks to maximize the value to the population (starting node) with the \emph{least} expected value. We consider two variants of the maximin objective that turn out to be distinct, depending on whether we demand a deterministic solution or allow randomization. For each objective, we give an efficient approximation algorithm (an additive FPTAS) for constant width networks. We also tightly characterize the "price of fairness" in our setting: the ratio between the highest achievable social welfare and the highest social welfare consistent with a maximin optimal solution. Finally we show that for polynomial width networks, even approximating the maximin objective to any constant factor is NP hard, even for networks with constant depth. This shows that the restriction on the width in our positive results is essential.
GTAug 27, 2018
Downstream Effects of Affirmative ActionSampath Kannan, Aaron Roth, Juba Ziani
We study a two-stage model, in which students are 1) admitted to college on the basis of an entrance exam which is a noisy signal about their qualifications (type), and then 2) those students who were admitted to college can be hired by an employer as a function of their college grades, which are an independently drawn noisy signal of their type. Students are drawn from one of two populations, which might have different type distributions. We assume that the employer at the end of the pipeline is rational, in the sense that it computes a posterior distribution on student type conditional on all information that it has available (college admissions, grades, and group membership), and makes a decision based on posterior expectation. We then study what kinds of fairness goals can be achieved by the college by setting its admissions rule and grading policy. For example, the college might have the goal of guaranteeing equal opportunity across populations: that the probability of passing through the pipeline and being hired by the employer should be independent of group membership, conditioned on type. Alternately, the college might have the goal of incentivizing the employer to have a group blind hiring rule. We show that both goals can be achieved when the college does not report grades. On the other hand, we show that under reasonable conditions, these goals are impossible to achieve even in isolation when the college uses an (even minimally) informative grading policy.
EMOct 10, 2017
Inference on Auctions with Weak Assumptions on InformationVasilis Syrgkanis, Elie Tamer, Juba Ziani
Given a sample of bids from independent auctions, this paper examines the question of inference on auction fundamentals (e.g. valuation distributions, welfare measures) under weak assumptions on information structure. The question is important as it allows us to learn about the valuation distribution in a robust way, i.e., without assuming that a particular information structure holds across observations. We leverage the recent contributions of \cite{Bergemann2013} in the robust mechanism design literature that exploit the link between Bayesian Correlated Equilibria and Bayesian Nash Equilibria in incomplete information games to construct an econometrics framework for learning about auction fundamentals using observed data on bids. We showcase our construction of identified sets in private value and common value auctions. Our approach for constructing these sets inherits the computational simplicity of solving for correlated equilibria: checking whether a particular valuation distribution belongs to the identified set is as simple as determining whether a {\it linear} program is feasible. A similar linear program can be used to construct the identified set on various welfare measures and counterfactual objects. For inference and to summarize statistical uncertainty, we propose novel finite sample methods using tail inequalities that are used to construct confidence regions on sets. We also highlight methods based on Bayesian bootstrap and subsampling. A set of Monte Carlo experiments show adequate finite sample properties of our inference procedures. We illustrate our methods using data from OCS auctions.