45.8CYMay 26
Fixed Points and Stochastic Meritocracies: A Long-Term PerspectiveGaurab Pokharel, Diptangshu Sen, Sanmay Das et al.
We study group fairness in the context of feedback loops induced by meritocratic selection into programs that themselves confer additional advantage, like college admissions. We introduce a stylized, yet novel inter-generational model for the setting and analyze it in situations where there are no underlying differences between two populations. When the benefit of the program (or the harm of not getting into it) is completely symmetric, we show that disparities between the two populations will vanish on average in the long term, although in the short term disparities will continue to arise and dissipate cyclically. Further, the time an accumulated advantage takes to dissipate can be significant, and increases as a function of the relative importance of the program in conveying benefits. Interestingly, significant disparities can arise purely due to randomness even from completely symmetric initial conditions, especially when populations are small. The introduction of even a slight asymmetry, where the group that has accumulated an advantage becomes slightly preferred, leads to a completely different outcome. In these instances, starting from completely symmetric initial conditions, disparities between groups arise stochastically and then persist over time, yielding a permanent advantage for one group. Our analysis precisely characterizes conditions under which disparities persist or diminish, with a particular focus on the role of the scarcity of available spots in the program and its effectiveness. We also present extensive simulations in a richer model that further support our theoretical results in the simpler, stylized model. Our findings are relevant for the design and implementation of algorithmic fairness interventions in similar selection processes.
68.9GTMay 18
Data Sharing with Endogenous Choices over Differential Privacy LevelsRaef Bassily, Kate Donahue, Diptangshu Sen et al.
Motivated by the rapid push to decentralize sharing of data, we study whether large-scale data sharing coalitions can form in a decentralized manner under differential privacy when players have heterogeneous privacy preferences. We first consider a fully decentralized data-sharing mechanism in which each player decides whether to participate and how much privacy noise to add locally to their sensitive data before sharing. Privacy choices induce a fundamental trade-off: higher privacy lowers individual privacy costs but reduces data utility and statistical accuracy for the coalition. These choices generate externalities across players, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how fully decentralized data-sharing compares to a centralized, socially optimal benchmark when the number of players is large. We provide a comprehensive analysis across multiple privacy-cost regimes corresponding to different attack/observation models in differential privacy, showing that full decentralization is highly inefficient in terms of both social welfare and estimator accuracy. Surprisingly, we find that a simple partially decentralized mechanism (where players still retain participation agency, but a central designer chooses a fixed privacy noise level for everyone) closes this efficiency gap down to constant factors across all privacy-cost regimes.
22.3GTMay 25
The Impact of Competition on Outcomes of Score-Based College AdmissionsGeorge Bentley, Diptangshu Sen, Juba Ziani
We study how the design of admissions policies affects the ability of students admitted to universities. In our model, applicants have a multi-dimensional ability, which is a combination of a "type" and a "soft skill." Universities may differ in how they evaluate quality and have differing preferences on type and soft skills. Then, university admissions rely on a single noisy aggregate signal, such as a test score, that may not fully align with the university's preferences, and a university evaluates applicants through the posterior expectations of their preference metric given the observed signal. Our main results highlight that the design of good admission policies can be counter-intuitive. Under a single university, when holding the number of qualified applicants constant, increasing the usefulness of the signal (by aligning it more closely with the university preferences) leads to a worse type and soft skill for admitted students. Further, a university cannot affect the composition of students that are strong on type versus soft skills by changing their preferences. The picture becomes even more complicated under competition between as few as two universities: self-selection effects among students admitted to both universities can lead to part of the applicant pool switching which university they prefer, even under small changes in the design of the noisy signal. This can, in particular, lead to sudden and non-monotonic loss in the quality of admitted students when changing the alignment between signal and university preferences. Further, a university can get more students by increasing their selectivity. Finally, when admissions rely on separate noisy scores for type and for soft skills, we show that universities that put more emphasis on type (respectively soft skills) end up, counter-intuitively, admitting students with higher soft skills (respectively type).
CRAug 8, 2024
Differentially Private Data Release on Graphs: Inefficiencies and UnfairnessFerdinando Fioretto, Diptangshu Sen, Juba Ziani
Networks are crucial components of many sectors, including telecommunications, healthcare, finance, energy, and transportation.The information carried in such networks often contains sensitive user data, like location data for commuters and packet data for online users. Therefore, when considering data release for networks, one must ensure that data release mechanisms do not leak information about individuals, quantified in a precise mathematical sense. Differential Privacy (DP) is the widely accepted, formal, state-of-the-art technique, which has found use in a variety of real-life settings including the 2020 U.S. Census, Apple users' device data, or Google's location data. Yet, the use of DP comes with new challenges, as the noise added for privacy introduces inaccuracies or biases and further, DP techniques can also distribute these biases disproportionately across different populations, inducing fairness issues. The goal of this paper is to characterize the impact of DP on bias and unfairness in the context of releasing information about networks, taking a departure from previous work which has studied these effects in the context of private population counts release (such as in the U.S. Census). To this end, we consider a network release problem where the network structure is known to all, but the weights on edges must be released privately. We consider the impact of this private release on a simple downstream decision-making task run by a third-party, which is to find the shortest path between any two pairs of nodes and recommend the best route to users. This setting is of highly practical relevance, mirroring scenarios in transportation networks, where preserving privacy while providing accurate routing information is crucial. Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
GTFeb 10, 2025
Incentivizing Desirable Effort Profiles in Strategic Classification: The Role of Causality and UncertaintyValia Efthymiou, Chara Podimata, Diptangshu Sen et al. · harvard
We study strategic classification in binary decision-making settings where agents can modify their features in order to improve their classification outcomes. Importantly, our work considers the causal structure across different features, acknowledging that effort in a given feature may affect other features. The main goal of our work is to understand \emph{when and how much agent effort is invested towards desirable features}, and how this is influenced by the deployed classifier, the causal structure of the agent's features, their ability to modify them, and the information available to the agent about the classifier and the feature causal graph. In the complete information case, when agents know the classifier and the causal structure of the problem, we derive conditions ensuring that rational agents focus on features favored by the principal. We show that designing classifiers to induce desirable behavior is generally non-convex, though tractable in special cases. We also extend our analysis to settings where agents have incomplete information about the classifier or the causal graph. While optimal effort selection is again a non-convex problem under general uncertainty, we highlight special cases of partial uncertainty where this selection problem becomes tractable. Our results indicate that uncertainty drives agents to favor features with higher expected importance and lower variance, potentially misaligning with principal preferences. Finally, numerical experiments based on a cardiovascular disease risk study illustrate how to incentivize desirable modifications under uncertainty.