LGAug 30, 2022
On the Trade-Off between Actionable Explanations and the Right to be ForgottenMartin Pawelczyk, Tobias Leemann, Asia Biega et al.
As machine learning (ML) models are increasingly being deployed in high-stakes applications, policymakers have suggested tighter data protection regulations (e.g., GDPR, CCPA). One key principle is the "right to be forgotten" which gives users the right to have their data deleted. Another key principle is the right to an actionable explanation, also known as algorithmic recourse, allowing users to reverse unfavorable decisions. To date, it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. More specifically, we theoretically and empirically analyze the behavior of popular state-of-the-art algorithms and demonstrate that the recourses generated by these algorithms are likely to be invalidated if a small number of data deletion requests (e.g., 1 or 2) warrant updates of the predictive model. For the setting of differentiable models, we suggest a framework to identify a minimal subset of critical training points which, when removed, maximize the fraction of invalidated recourses. Using our framework, we empirically show that the removal of as little as 2 data instances from the training set can invalidate up to 95 percent of all recourses output by popular state-of-the-art algorithms. Thus, our work raises fundamental questions about the compatibility of "the right to an actionable explanation" in the context of the "right to be forgotten", while also providing constructive insights on the determining factors of recourse robustness.
IRSep 21, 2024
The trade-off between data minimization and fairness in collaborative filteringNasim Sonboli, Sipei Li, Mehdi Elahi et al.
General Data Protection Regulations (GDPR) aim to safeguard individuals' personal information from harm. While full compliance is mandatory in the European Union and the California Privacy Rights Act (CPRA), it is not in other places. GDPR requires simultaneous compliance with all the principles such as fairness, accuracy, and data minimization. However, it overlooks the potential contradictions within its principles. This matter gets even more complex when compliance is required from decision-making systems. Therefore, it is essential to investigate the feasibility of simultaneously achieving the goals of GDPR and machine learning, and the potential tradeoffs that might be forced upon us. This paper studies the relationship between the principles of data minimization and fairness in recommender systems. We operationalize data minimization via active learning (AL) because, unlike many other methods, it can preserve a high accuracy while allowing for strategic data collection, hence minimizing the amount of data collection. We have implemented several active learning strategies (personalized and non-personalized) and conducted a comparative analysis focusing on accuracy and fairness on two publicly available datasets. The results demonstrate that different AL strategies may have different impacts on the accuracy of recommender systems with nearly all strategies negatively impacting fairness. There has been no to very limited work on the trade-off between data minimization and fairness, the pros and cons of active learning methods as tools for implementing data minimization, and the potential impacts of AL on fairness. By exploring these critical aspects, we offer valuable insights for developing recommender systems that are GDPR compliant.
CYApr 19
Co-designing for Compliance: Multi-party Computation Protocols for Post-Market Fairness Monitoring in Algorithmic HiringChangyang He, Nina Baranowska, Josu Andoni Eguíluz Castañeira et al.
Post-market fairness monitoring is now mandated to ensure fairness and accountability for high-risk employment AI systems under emerging regulations such as the EU AI Act. However, effective fairness monitoring often requires access to sensitive personal data, which is subject to strict legal protections under data protection law. Multi-party computation (MPC) offers a promising technical foundation for compliant post-market fairness monitoring, enabling the secure computation of fairness metrics without revealing sensitive attributes. Despite growing technical interest, the operationalization of MPC-based fairness monitoring in real-world hiring contexts under concrete legal, industrial, and usability constraints remains unknown. This work addresses this gap through a co-design approach integrating technical, legal, and industrial expertise. We identify practical design requirements for MPC-based fairness monitoring, develop an end-to-end, legally compliant protocol spanning the full data lifecycle, and empirically validate it in a large-scale industrial setting. Our findings provide actionable design insights as well as legal and industrial implications for deploying MPC-based post-market fairness monitoring in algorithmic hiring systems.
HCApr 19
Value Sensitive Design for Fair Online Recruitment: A Conceptual Framework Informed by Job Seekers' Fairness ConcernsChangyang He, Yue Deng, Alessandro Fabris et al.
The susceptibility to biases and discrimination is a pressing issue in today's labor markets. While digital recruitment systems play an increasingly significant role in human resource management, a systematic understanding of human-centered design principles for fair online hiring remains lacking, particularly considering the gap between idealized conceptualizations of fairness in research and actual fairness concerns expressed by job seekers. To address this gap, this work explores the potential of developing a fair recruitment framework based on job seekers' fairness concerns shared in r/jobs, one of the largest online job communities. Through a grounded theory approach, we uncover four overarching themes of job seekers' fairness concerns: personal attribute discrimination beyond legally protected attributes, interaction biases, improper interpretations of qualifications, and power imbalance. Drawing on value sensitive design, we derive design implications for fair algorithms and interfaces in recruitment systems, integrating them into a conceptual framework that spans different hiring stages.
AIAug 1, 2024
Unlocking Fair Use in the Generative AI Supply Chain: A Systematized Literature ReviewAmruta Mahuli, Asia Biega
Through a systematization of generative AI (GenAI) stakeholder goals and expectations, this work seeks to uncover what value different stakeholders see in their contributions to the GenAI supply line. This valuation enables us to understand whether fair use advocated by GenAI companies to train model progresses the copyright law objective of promoting science and arts. While assessing the validity and efficacy of the fair use argument, we uncover research gaps and potential avenues for future works for researchers and policymakers to address.
LGFeb 17, 2025
What's in a Query: Polarity-Aware Distribution-Based Fair RankingAparna Balagopalan, Kai Wang, Olawale Salaudeen et al.
Machine learning-driven rankings, where individuals (or items) are ranked in response to a query, mediate search exposure or attention in a variety of safety-critical settings. Thus, it is important to ensure that such rankings are fair. Under the goal of equal opportunity, attention allocated to an individual on a ranking interface should be proportional to their relevance across search queries. In this work, we examine amortized fair ranking -- where relevance and attention are cumulated over a sequence of user queries to make fair ranking more feasible in practice. Unlike prior methods that operate on expected amortized attention for each individual, we define new divergence-based measures for attention distribution-based fairness in ranking (DistFaiR), characterizing unfairness as the divergence between the distribution of attention and relevance corresponding to an individual over time. This allows us to propose new definitions of unfairness, which are more reliable at test time. Second, we prove that group fairness is upper-bounded by individual fairness under this definition for a useful class of divergence measures, and experimentally show that maximizing individual fairness through an integer linear programming-based optimization is often beneficial to group fairness. Lastly, we find that prior research in amortized fair ranking ignores critical information about queries, potentially leading to a fairwashing risk in practice by making rankings appear more fair than they actually are.
IRMay 9, 2023
The Role of Relevance in Fair RankingAparna Balagopalan, Abigail Z. Jacobs, Asia Biega
Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because these constructs are typically not directly observable, platforms must instead resort to using proxy scores such as relevance and infer them from behavioral signals such as searcher clicks. Yet, it remains an open question whether relevance fulfills its role as such a worthiness score in high-stakes fair rankings. In this paper, we combine perspectives and tools from the social sciences, information retrieval, and fairness in machine learning to derive a set of desired criteria that relevance scores should satisfy in order to meaningfully guide fairness interventions. We then empirically show that not all of these criteria are met in a case study of relevance inferred from biased user click data. We assess the impact of these violations on the estimated system fairness and analyze whether existing fairness interventions may mitigate the identified issues. Our analyses and results surface the pressing need for new approaches to relevance collection and generation that are suitable for use in fair ranking.
IRAug 11, 2021
Estimation of Fair Ranking Metrics with Incomplete JudgmentsÖmer Kırnap, Fernando Diaz, Asia Biega et al.
There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation.
LGJul 16, 2021
Learning to Limit Data Collection via Scaling Laws: A Computational Interpretation for the Legal Principle of Data MinimizationDivya Shanmugam, Samira Shabanian, Fernando Diaz et al.
Modern machine learning systems are increasingly characterized by extensive personal data collection, despite the diminishing returns and increasing societal costs of such practices. Yet, data minimisation is one of the core data protection principles enshrined in the European Union's General Data Protection Regulation ('GDPR') and requires that only personal data that is adequate, relevant and limited to what is necessary is processed. However, the principle has seen limited adoption due to the lack of technical interpretation. In this work, we build on literature in machine learning and law to propose FIDO, a Framework for Inhibiting Data Overcollection. FIDO learns to limit data collection based on an interpretation of data minimization tied to system performance. Concretely, FIDO provides a data collection stopping criterion by iteratively updating an estimate of the performance curve, or the relationship between dataset size and performance, as data is acquired. FIDO estimates the performance curve via a piecewise power law technique that models distinct phases of an algorithm's performance throughout data collection separately. Empirical experiments show that the framework produces accurate performance curves and data collection stopping criteria across datasets and feature acquisition algorithms. We further demonstrate that many other families of curves systematically overestimate the return on additional data. Results and analysis from our investigation offer deeper insights into the relevant considerations when designing a data minimization framework, including the impacts of active feature acquisition on individual users and the feasability of user-specific data minimization. We conclude with practical recommendations for the implementation of data minimization.
IRDec 20, 2019
Report on the First HIPstIR Workshop on the Future of Information RetrievalLaura Dietz, Bhaskar Mitra, Jeremy Pickens et al.
The vision of HIPstIR is that early stage information retrieval (IR) researchers get together to develop a future for non-mainstream ideas and research agendas in IR. The first iteration of this vision materialized in the form of a three day workshop in Portsmouth, New Hampshire attended by 24 researchers across academia and industry. Attendees pre-submitted one or more topics that they want to pitch at the meeting. Then over the three days during the workshop, we self-organized into groups and worked on six specific proposals of common interest. In this report, we present an overview of the workshop and brief summaries of the six proposals that resulted from the workshop.