Hilde Weerts

h-index6

9papers

456citations

Novelty28%

AI Score24

Ranked #172,896 of 194,257 authors (top 89%)#37,488 in LG (top 93%)

9 Papers

29.7LGMar 29, 2023Code

Fairlearn: Assessing and Improving Fairness of AI Systems

Hilde Weerts, Miroslav Dudík, Richard Edgar et al. · microsoft-research

Fairlearn is an open source project to help practitioners assess and improve fairness of artificial intelligence (AI) systems. The associated Python library, also named fairlearn, supports evaluation of a model's output across affected populations and includes several algorithms for mitigating fairness issues. Grounded in the understanding that fairness is a sociotechnical challenge, the project integrates learning resources that aid practitioners in considering a system's broader societal context.

16.8AIMar 15, 2023

Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML

Hilde Weerts, Florian Pfisterer, Matthias Feurer et al.

The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to propose AutoML systems that jointly optimize fairness and predictive performance to mitigate fairness-related harm. However, fairness is a complex and inherently interdisciplinary subject, and solely posing it as an optimization problem can have adverse side effects. With this work, we aim to raise awareness among developers of AutoML systems about such limitations of fairness-aware AutoML, while also calling attention to the potential of AutoML as a tool for fairness research. We present a comprehensive overview of different ways in which fairness-related harm can arise and the ensuing implications for the design of fairness-aware AutoML. We conclude that while fairness cannot be automated, fairness-aware AutoML can play an important role in the toolbox of ML practitioners. We highlight several open technical challenges for future work in this direction. Additionally, we advocate for the creation of more user-centered assistive systems designed to tackle challenges encountered in fairness work

9.6AIApr 18, 2024

The Neutrality Fallacy: When Algorithmic Fairness Interventions are (Not) Positive Action

Hilde Weerts, Raphaële Xenidis, Fabien Tarissan et al.

Various metrics and interventions have been developed to identify and mitigate unfair outputs of machine learning systems. While individuals and organizations have an obligation to avoid discrimination, the use of fairness-aware machine learning interventions has also been described as amounting to 'algorithmic positive action' under European Union (EU) non-discrimination law. As the Court of Justice of the European Union has been strict when it comes to assessing the lawfulness of positive action, this would impose a significant legal burden on those wishing to implement fair-ml interventions. In this paper, we propose that algorithmic fairness interventions often should be interpreted as a means to prevent discrimination, rather than a measure of positive action. Specifically, we suggest that this category mistake can often be attributed to neutrality fallacies: faulty assumptions regarding the neutrality of fairness-aware algorithmic decision-making. Our findings raise the question of whether a negative obligation to refrain from discrimination is sufficient in the context of algorithmic decision-making. Consequently, we suggest moving away from a duty to 'not do harm' towards a positive obligation to actively 'do no harm' as a more adequate framework for algorithmic decision-making and fair ml-interventions.

8.5AIApr 22, 2024

Unlawful Proxy Discrimination: A Framework for Challenging Inherently Discriminatory Algorithms

Hilde Weerts, Aislinn Kelly-Lyth, Reuben Binns et al.

Emerging scholarship suggests that the EU legal concept of direct discrimination - where a person is given different treatment on grounds of a protected characteristic - may apply to various algorithmic decision-making contexts. This has important implications: unlike indirect discrimination, there is generally no 'objective justification' stage in the direct discrimination framework, which means that the deployment of directly discriminatory algorithms will usually be unlawful per se. In this paper, we focus on the most likely candidate for direct discrimination in the algorithmic context, termed inherent direct discrimination, where a proxy is inextricably linked to a protected characteristic. We draw on computer science literature to suggest that, in the algorithmic context, 'treatment on the grounds of' needs to be understood in terms of two steps: proxy capacity and proxy use. Only where both elements can be made out can direct discrimination be said to be `on grounds of' a protected characteristic. We analyse the legal conditions of our proposed proxy capacity and proxy use tests. Based on this analysis, we discuss technical approaches and metrics that could be developed or applied to identify inherent direct discrimination in algorithmic decision-making.

11.3CYMay 5, 2023

Algorithmic Unfairness through the Lens of EU Non-Discrimination Law: Or Why the Law is not a Decision Tree

Hilde Weerts, Raphaële Xenidis, Fabien Tarissan et al.

Concerns regarding unfairness and discrimination in the context of artificial intelligence (AI) systems have recently received increased attention from both legal and computer science scholars. Yet, the degree of overlap between notions of algorithmic bias and fairness on the one hand, and legal notions of discrimination and equality on the other, is often unclear, leading to misunderstandings between computer science and law. What types of bias and unfairness does the law address when it prohibits discrimination? What role can fairness metrics play in establishing legal compliance? In this paper, we aim to illustrate to what extent European Union (EU) non-discrimination law coincides with notions of algorithmic fairness proposed in computer science literature and where they differ. The contributions of this paper are as follows. First, we analyse seminal examples of algorithmic unfairness through the lens of EU non-discrimination law, drawing parallels with EU case law. Second, we set out the normative underpinnings of fairness metrics and technical interventions and compare these to the legal reasoning of the Court of Justice of the EU. Specifically, we show how normative assumptions often remain implicit in both disciplinary approaches and explain the ensuing limitations of current AI practice and non-discrimination law. We conclude with implications for AI practitioners and regulators.

8.7LGFeb 17, 2022

Are There Exceptions to Goodhart's Law? On the Moral Justification of Fairness-Aware Machine Learning

Hilde Weerts, Lambèr Royakkers, Mykola Pechenizkiy

Fairness-aware machine learning (fair-ml) techniques are algorithmic interventions designed to ensure that individuals who are affected by the predictions of a machine learning model are treated fairly. The problem is often posed as an optimization problem, where the objective is to achieve high predictive performance under a quantitative fairness constraint. However, any attempt to design a fair-ml algorithm must assume a world where Goodhart's law has an exception: when a fairness measure becomes an optimization constraint, it does not cease to be a good measure. In this paper, we argue that fairness measures are particularly sensitive to Goodhart's law. Our main contributions are as follows. First, we present a framework for moral reasoning about the justification of fairness metrics. In contrast to existing work, our framework incorporates the belief that whether a distribution of outcomes is fair, depends not only on the cause of inequalities but also on what moral claims decision subjects have to receive a particular benefit or avoid a burden. We use the framework to distil moral and empirical assumptions under which particular fairness metrics correspond to a fair distribution of outcomes. Second, we explore the extent to which employing fairness metrics as a constraint in a fair-ml algorithm is morally justifiable, exemplified by the fair-ml algorithm introduced by Hardt et al. (2016). We illustrate that enforcing a fairness metric through a fair-ml algorithm often does not result in the fair distribution of outcomes that motivated its use and can even harm the individuals the intervention was intended to protect.

15.0LGJul 15, 2020

Importance of Tuning Hyperparameters of Machine Learning Algorithms

Hilde J. P. Weerts, Andreas C. Mueller, Joaquin Vanschoren

The performance of many machine learning algorithms depends on their hyperparameter settings. The goal of this study is to determine whether it is important to tune a hyperparameter or whether it can be safely set to a default value. We present a methodology to determine the importance of tuning a hyperparameter based on a non-inferiority test and tuning risk: the performance loss that is incurred when a hyperparameter is not tuned, but set to a default value. Because our methods require the notion of a default parameter, we present a simple procedure that can be used to determine reasonable default parameters. We apply our methods in a benchmark study using 59 datasets from OpenML. Our results show that leaving particular hyperparameters at their default value is non-inferior to tuning these hyperparameters. In some cases, leaving the hyperparameter at its default value even outperforms tuning it using a search procedure with a limited number of iterations.

2.7LGJul 7, 2019

Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models

Hilde J. P. Weerts, Werner van Ipenburg, Mykola Pechenizkiy

In many contexts, it can be useful for domain experts to understand to what extent predictions made by a machine learning model can be trusted. In particular, estimates of trustworthiness can be useful for fraud analysts who process machine learning-generated alerts of fraudulent transactions. In this work, we present a case-based reasoning (CBR) approach that provides evidence on the trustworthiness of a prediction in the form of a visualization of similar previous instances. Different from previous works, we consider similarity of local post-hoc explanations of predictions and show empirically that our visualization can be useful for processing alerts. Furthermore, our approach is perceived useful and easy to use by fraud analysts at a major Dutch bank.

16.5LGJul 7, 2019

A Human-Grounded Evaluation of SHAP for Alert Processing

Hilde J. P. Weerts, Werner van Ipenburg, Mykola Pechenizkiy

In the past years, many new explanation methods have been proposed to achieve interpretability of machine learning predictions. However, the utility of these methods in practical applications has not been researched extensively. In this paper we present the results of a human-grounded evaluation of SHAP, an explanation method that has been well-received in the XAI and related communities. In particular, we study whether this local model-agnostic explanation method can be useful for real human domain experts to assess the correctness of positive predictions, i.e. alerts generated by a classifier. We performed experimentation with three different groups of participants (159 in total), who had basic knowledge of explainable machine learning. We performed a qualitative analysis of recorded reflections of experiment participants performing alert processing with and without SHAP information. The results suggest that the SHAP explanations do impact the decision-making process, although the model's confidence score remains to be a leading source of evidence. We statistically test whether there is a significant difference in task utility metrics between tasks for which an explanation was available and tasks in which it was not provided. As opposed to common intuitions, we did not find a significant difference in alert processing performance when a SHAP explanation is available compared to when it is not.