Ulrich Aïvodji

LG
h-index30
17papers
404citations
Novelty57%
AI Score51

17 Papers

LGMay 26
Test-Time Collective Action: Proxy-Based Perturbations for Correcting Algorithmic Harms

Meghana Bhange, Ulrich Aïvodji, Elliot Creager

When machine learning systems under-perform for particular subgroups, affected users typically have no way to correct these disparities without relying on platform-level fixes. Existing approaches to algorithmic fairness rely on provider-centric approaches to correct these failures, leaving users with no external lever when faced with harm. Recent work in Algorithmic Collective Action shows that coordinated users can steer an algorithmic system toward a collective goal, but the existing mechanisms require the provider to retrain on the collective's modified data which users may not have control over. We propose Test-Time Collective Action (TTCA), a framework through which a group of users who share query access to the platform, can correct disparities affecting under-served subgroup without participating in the platform's training loop. We implement this through a proxy-based mechanism where the collective pools query access to a black-box API to extract a proxy of the platform, then optimizes a per-class universal perturbation against the proxy. Each member applies this perturbation to their own inputs at submission time, requiring no cooperation from the platform. We empirically evaluate the mechanism on CIFAR-10, CIFAR-100, and FairFace, showing that modestly-sized collectives close most of the subgroup accuracy gap, transfer across architectures (a small proxy can attack a larger platform), and improve worst-group accuracy, equal-opportunity gap, and disparate impact. A query-budget analysis comparing a per-user black-box attack baseline shows that pooling is cheaper than each subgroup member attacking alone. Test-time collective action thus offers corrective intervention to users when platform-side remediation is unavailable or delayed.

LGMay 30, 2022
Fool SHAP with Stealthily Biased Sampling

Gabriel Laberge, Ulrich Aïvodji, Satoshi Hara et al.

SHAP explanations aim at identifying which features contribute the most to the difference in model prediction at a specific input versus a background distribution. Recent studies have shown that they can be manipulated by malicious adversaries to produce arbitrary desired explanations. However, existing attacks focus solely on altering the black-box model itself. In this paper, we propose a complementary family of attacks that leave the model intact and manipulate SHAP explanations using stealthily biased sampling of the data points used to approximate expectations w.r.t the background distribution. In the context of fairness audit, we show that our attack can reduce the importance of a sensitive feature when explaining the difference in outcomes between groups while remaining undetected. More precisely, experiments performed on real-world datasets showed that our attack could yield up to a 90\% relative decrease in amplitude of the sensitive feature attribution. These results highlight the manipulability of SHAP explanations and encourage auditors to treat them with skepticism.

LGSep 2, 2022
Exploiting Fairness to Enhance Sensitive Attributes Reconstruction

Julien Ferry, Ulrich Aïvodji, Sébastien Gambs et al.

In recent years, a growing body of work has emerged on how to learn machine learning models under fairness constraints, often expressed with respect to some sensitive attributes. In this work, we consider the setting in which an adversary has black-box access to a target model and show that information about this model's fairness can be exploited by the adversary to enhance his reconstruction of the sensitive attributes of the training data. More precisely, we propose a generic reconstruction correction method, which takes as input an initial guess made by the adversary and corrects it to comply with some user-defined constraints (such as the fairness information) while minimizing the changes in the adversary's guess. The proposed method is agnostic to the type of target model, the fairness-aware learning method as well as the auxiliary knowledge of the adversary. To assess the applicability of our approach, we have conducted a thorough experimental evaluation on two state-of-the-art fair learning methods, using four different fairness metrics with a wide range of tolerances and with three datasets of diverse sizes and sensitive attributes. The experimental results demonstrate the effectiveness of the proposed approach to improve the reconstruction of the sensitive attributes of the training set.

LGMay 27
When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models

Ziba Jabbar Zare, Ulrich Aïvodji, Julien Ferry et al.

Hybrid interpretable models combine a transparent component with a black-box model by assigning some examples to the former and deferring the rest to the latter. While this design enables flexible tradeoffs between accuracy and interpretability, it also raises a distinct procedural fairness concern: some demographic groups may systematically receive interpretable decisions, while others are disproportionately routed to a black box. We formalize this issue as Interpretability Coverage Disparity (ICD), a demographic-parity-style measure applied to the routing decision of hybrid interpretable models. Using tools from predictive multiplicity, we study ICD across four hybrid interpretable learning methods, three standard fairness benchmark datasets, and multiple sensitive attributes. Our experiments reveal substantial ICD in intermediate transparency regimes, where both the interpretable and black-box components are actively used. We further show that simple coverage-disparity constraints can significantly reduce ICD in exact hybrid learning methods, with marginal impact on accuracy and sparsity. In several settings, ICD mitigation also improves standard algorithmic fairness metrics. These results show that hybrid interpretable models should be audited not only for predictive fairness, but also for how they allocate interpretability across individuals and groups.

AIAug 29, 2023
Probabilistic Dataset Reconstruction from Interpretable Models

Julien Ferry, Ulrich Aïvodji, Sébastien Gambs et al.

Interpretability is often pointed out as a key requirement for trustworthy machine learning. However, learning and releasing models that are inherently interpretable leaks information regarding the underlying training data. As such disclosure may directly conflict with privacy, a precise quantification of the privacy impact of such breach is a fundamental problem. For instance, previous work have shown that the structure of a decision tree can be leveraged to build a probabilistic reconstruction of its training dataset, with the uncertainty of the reconstruction being a relevant metric for the information leak. In this paper, we propose of a novel framework generalizing these probabilistic reconstructions in the sense that it can handle other forms of interpretable models and more generic types of knowledge. In addition, we demonstrate that under realistic assumptions regarding the interpretable models' structure, the uncertainty of the reconstruction can be computed efficiently. Finally, we illustrate the applicability of our approach on both decision trees and rule lists, by comparing the theoretical information leak associated to either exact or heuristic learning algorithms. Our results suggest that optimal interpretable models are often more compact and leak less information regarding their training data than greedily-built ones, for a given accuracy level.

LGJul 24, 2023
Fairness Under Demographic Scarce Regime

Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji

Most existing works on fairness assume the model has full access to demographic information. However, there exist scenarios where demographic information is partially available because a record was not maintained throughout data collection or for privacy reasons. This setting is known as demographic scarce regime. Prior research has shown that training an attribute classifier to replace the missing sensitive attributes (proxy) can still improve fairness. However, using proxy-sensitive attributes worsens fairness-accuracy tradeoffs compared to true sensitive attributes. To address this limitation, we propose a framework to build attribute classifiers that achieve better fairness-accuracy tradeoffs. Our method introduces uncertainty awareness in the attribute classifier and enforces fairness on samples with demographic information inferred with the lowest uncertainty. We show empirically that enforcing fairness constraints on samples with uncertain sensitive attributes can negatively impact the fairness-accuracy tradeoff. Our experiments on five datasets showed that the proposed framework yields models with significantly better fairness-accuracy tradeoffs than classic attribute classifiers. Surprisingly, our framework can outperform models trained with fairness constraints on the true sensitive attributes in most benchmarks. We also show that these findings are consistent with other uncertainty measures such as conformal prediction.

LGMar 8, 2023
Learning Hybrid Interpretable Models: Theory, Taxonomy, and Methods

Julien Ferry, Gabriel Laberge, Ulrich Aïvodji

A hybrid model involves the cooperation of an interpretable model and a complex black box. At inference, any input of the hybrid model is assigned to either its interpretable or complex component based on a gating mechanism. The advantages of such models over classical ones are two-fold: 1) They grant users precise control over the level of transparency of the system and 2) They can potentially perform better than a standalone black box since redirecting some of the inputs to an interpretable model implicitly acts as regularization. Still, despite their high potential, hybrid models remain under-studied in the interpretability/explainability literature. In this paper, we remedy this fact by presenting a thorough investigation of such models from three perspectives: Theory, Taxonomy, and Methods. First, we explore the theory behind the generalization of hybrid models from the Probably-Approximately-Correct (PAC) perspective. A consequence of our PAC guarantee is the existence of a sweet spot for the optimal transparency of the system. When such a sweet spot is attained, a hybrid model can potentially perform better than a standalone black box. Secondly, we provide a general taxonomy for the different ways of training hybrid models: the Post-Black-Box and Pre-Black-Box paradigms. These approaches differ in the order in which the interpretable and complex components are trained. We show where the state-of-the-art hybrid models Hybrid-Rule-Set and Companion-Rule-List fall in this taxonomy. Thirdly, we implement the two paradigms in a single method: HybridCORELS, which extends the CORELS algorithm to hybrid modeling. By leveraging CORELS, HybridCORELS provides a certificate of optimality of its interpretable component and precise control over transparency. We finally show empirically that HybridCORELS is competitive with existing hybrid models, and performs just as well as a standalone black box (or even better) while being partly transparent.

LGMay 14, 2025Code
Towards Fair In-Context Learning with Tabular Foundation Models

Patrik Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji

Transformer-based tabular foundation models have recently demonstrated promising in-context learning (ICL) performance on structured data, emerging as competitive alternatives to gradient-boosted trees. However, the fairness implications of this new paradigm remain largely unexplored. We present the first investigation of fairness in tabular ICL, evaluating three recently proposed foundation models -- TabPFNv2, TabICL, and TabDPT -- on multiple benchmark datasets. To mitigate biases, we explore three pre-processing fairness-enhancing methods: correlation removal (decorrelating input features from the sensitive attribute), group-balanced sample selection (ensuring equal representation of protected groups in context examples), and uncertainty-based sample selection (prioritizing context examples with high sensitive-attribute prediction uncertainty). Our experiments show that the uncertainty-based strategy consistently improves group fairness metrics (e.g., demographic parity, equalized odds, and equal opportunity) with minimal impact on predictive accuracy. We release our code to facilitate reproducibility (https://github.com/patrikken/Fair-TabICL)

LGNov 22, 2024Code
Adaptive Group Robust Ensemble Knowledge Distillation

Patrik Kenfack, Ulrich Aïvodji, Samira Ebrahimi Kahou

Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex teacher model to a relatively ``simple'' student model. Prior work has shown that ensemble deep learning methods can improve the performance of the worst-case subgroups; however, it is unclear if this advantage carries over when distilling knowledge from an ensemble of teachers, especially when the teacher models are debiased. This study demonstrates that traditional ensemble knowledge distillation can significantly drop the performance of the worst-case subgroups in the distilled student model even when the teacher models are debiased. To overcome this, we propose Adaptive Group Robust Ensemble Knowledge Distillation (AGRE-KD), a simple ensembling strategy to ensure that the student model receives knowledge beneficial for unknown underrepresented subgroups. Leveraging an additional biased model, our method selectively chooses teachers whose knowledge would better improve the worst-performing subgroups by upweighting the teachers with gradient directions deviating from the biased model. Our experiments on several datasets demonstrate the superiority of the proposed ensemble distillation technique and show that it can even outperform classic model ensembles based on majority voting. Our source code is available at https://github.com/patrikken/AGRE-KD

LGDec 22, 2023
SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning

Julien Ferry, Ulrich Aïvodji, Sébastien Gambs et al.

Machine learning techniques are increasingly used for high-stakes decision-making, such as college admissions, loan attribution or recidivism prediction. Thus, it is crucial to ensure that the models learnt can be audited or understood by human users, do not create or reproduce discrimination or bias, and do not leak sensitive information regarding their training data. Indeed, interpretability, fairness and privacy are key requirements for the development of responsible machine learning, and all three have been studied extensively during the last decade. However, they were mainly considered in isolation, while in practice they interplay with each other, either positively or negatively. In this Systematization of Knowledge (SoK) paper, we survey the literature on the interactions between these three desiderata. More precisely, for each pairwise interaction, we summarize the identified synergies and tensions. These findings highlight several fundamental theoretical and empirical conflicts, while also demonstrating that jointly considering these different requirements is challenging when one aims at preserving a high level of utility. To solve this issue, we also discuss possible conciliation mechanisms, showing that a careful design can enable to successfully handle these different concerns in practice.

LGMay 9, 2025
Crowding Out The Noise: Algorithmic Collective Action Under Differential Privacy

Rushabh Solanki, Meghana Bhange, Ulrich Aïvodji et al.

The integration of AI into daily life has generated considerable attention and excitement, while also raising concerns about automating algorithmic harms and re-entrenching existing social inequities. While the responsible deployment of trustworthy AI systems is a worthy goal, there are many possible ways to realize it, from policy and regulation to improved algorithm design and evaluation. In fact, since AI trains on social data, there is even a possibility for everyday users, citizens, or workers to directly steer its behavior through Algorithmic Collective Action, by deliberately modifying the data they share with a platform to drive its learning process in their favor. This paper considers how these grassroots efforts to influence AI interact with methods already used by AI firms and governments to improve model trustworthiness. In particular, we focus on the setting where the AI firm deploys a differentially private model, motivated by the growing regulatory focus on privacy and data protection. We investigate how the use of Differentially Private Stochastic Gradient Descent (DPSGD) affects the collective's ability to influence the learning process. Our findings show that while differential privacy contributes to the protection of individual data, it introduces challenges for effective algorithmic collective action. We characterize lower bounds on the success of algorithmic collective action under differential privacy as a function of the collective's size and the firm's privacy parameters, and verify these trends experimentally by simulating collective action during the training of deep neural network classifiers across several datasets.

LGJun 14, 2021
Characterizing the risk of fairwashing

Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs et al.

Fairwashing refers to the risk that an unfair black-box model can be explained by a fairer model through post-hoc explanation manipulation. In this paper, we investigate the capability of fairwashing attacks by analyzing their fidelity-unfairness trade-offs. In particular, we show that fairwashed explanation models can generalize beyond the suing group (i.e., data points that are being explained), meaning that a fairwashed explainer can be used to rationalize subsequent unfair decisions of a black-box model. We also demonstrate that fairwashing attacks can transfer across black-box models, meaning that other black-box models can perform fairwashing without explicitly using their predictions. This generalization and transferability of fairwashing attacks imply that their detection will be difficult in practice. Finally, we propose an approach to quantify the risk of fairwashing, which is based on the computation of the range of the unfairness of high-fidelity explainers.

LGSep 3, 2020
Model extraction from counterfactual explanations

Ulrich Aïvodji, Alexandre Bolot, Sébastien Gambs

Post-hoc explanation techniques refer to a posteriori methods that can be used to explain how black-box machine learning models produce their outcomes. Among post-hoc explanation techniques, counterfactual explanations are becoming one of the most popular methods to achieve this objective. In particular, in addition to highlighting the most important features used by the black-box model, they provide users with actionable explanations in the form of data instances that would have received a different outcome. Nonetheless, by doing so, they also leak non-trivial information about the model itself, which raises privacy issues. In this work, we demonstrate how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. More precisely, our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations. The empirical evaluation of the proposed attack on black-box models trained on real-world datasets demonstrates that they can achieve high-fidelity and high-accuracy extraction even under low query budgets.

LGSep 26, 2019
GAMIN: An Adversarial Approach to Black-Box Model Inversion

Ulrich Aïvodji, Sébastien Gambs, Timon Ther

Recent works have demonstrated that machine learning models are vulnerable to model inversion attacks, which lead to the exposure of sensitive information contained in their training dataset. While some model inversion attacks have been developed in the past in the black-box attack setting, in which the adversary does not have direct access to the structure of the model, few of these have been conducted so far against complex models such as deep neural networks. In this paper, we introduce GAMIN (for Generative Adversarial Model INversion), a new black-box model inversion attack framework achieving significant results even against deep models such as convolutional neural networks at a reasonable computing cost. GAMIN is based on the continuous training of a surrogate model for the target model under attack and a generator whose objective is to generate inputs resembling those used to train the target model. The attack was validated against various neural networks used as image classifiers. In particular, when attacking models trained on the MNIST dataset, GAMIN is able to extract recognizable digits for up to 60% of labels produced by the target. Attacks against skin classification models trained on the pilot parliament dataset also demonstrated the capacity to extract recognizable features from the targets.

LGSep 9, 2019
Learning Fair Rule Lists

Ulrich Aïvodji, Julien Ferry, Sébastien Gambs et al.

As the use of black-box models becomes ubiquitous in high stake decision-making systems, demands for fair and interpretable models are increasing. While it has been shown that interpretable models can be as accurate as black-box models in several critical domains, existing fair classification techniques that are interpretable by design often display poor accuracy/fairness tradeoffs in comparison with their non-interpretable counterparts. In this paper, we propose FairCORELS, a fair classification technique interpretable by design, whose objective is to learn fair rule lists. Our solution is a multi-objective variant of CORELS, a branch-and-bound algorithm to learn rule lists, that supports several statistical notions of fairness. Examples of such measures include statistical parity, equal opportunity and equalized odds. The empirical evaluation of FairCORELS on real-world datasets demonstrates that it outperforms state-of-the-art fair classification techniques that are interpretable by design while being competitive with non-interpretable ones.

LGJun 19, 2019
Adversarial training approach for local data debiasing

Ulrich Aïvodji, François Bidet, Sébastien Gambs et al.

The widespread use of automated decision processes in many areas of our society raises serious ethical issues concerning the fairness of the process and the possible resulting discriminations. In this work, we propose a novel approach called GANsan whose objective is to prevent the possibility of any discrimination i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our sanitization algorithm GANsan is partially inspired by the powerful framework of generative adversarial networks (in particular the Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible and thus preserving the interpretability of the sanitized data. As a consequence, once the sanitizer is trained, it can be applied to new data, such as for instance, locally by an individual on his profile before releasing it. Finally, experiments on a real dataset demonstrate the effectiveness of the proposed approach as well as the achievable trade-off between fairness and utility.

LGJan 28, 2019
Fairwashing: the risk of rationalization

Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau et al.

Black-box explanation is the problem of explaining how a machine learning model -- whose internal logic is hidden to the auditor and generally complex -- produces its outcomes. Current approaches for solving this problem include model explanation, outcome explanation as well as model inspection. While these techniques can be beneficial by providing interpretability, they can be used in a negative manner to perform fairwashing, which we define as promoting the false perception that a machine learning model respects some ethical values. In particular, we demonstrate that it is possible to systematically rationalize decisions taken by an unfair black-box model using the model explanation as well as the outcome explanation approaches with a given fairness metric. Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule lists approximating an unfair black-box model. We empirically evaluate our rationalization technique on black-box models trained on real-world datasets and show that one can obtain rule lists with high fidelity to the black-box model while being considerably less unfair at the same time.