LGMay 26
Geometrically Constrained Outlier SynthesisDaniil Karzanov, Marcin Detyniecki
Deep neural networks for image classification often exhibit overconfidence on out-of-distribution (OOD) samples. To address this, we introduce Geometrically Constrained Outlier Synthesis (GCOS), a training-time regularization framework aimed at improving OOD robustness during inference. GCOS addresses a limitation of prior synthesis methods by generating virtual outliers in the hidden feature space that respect the learned manifold structure of in-distribution (ID) data. The synthesis proceeds in two stages: (i) a dominant-variance subspace extracted from the training features identifies geometrically informed, off-manifold directions; (ii) a conformally-inspired shell, defined by the empirical quantiles of a nonconformity score from a calibration set, adaptively controls the synthesis magnitude to produce boundary samples. The shell ensures that generated outliers are neither trivially detectable nor indistinguishable from in-distribution data, facilitating smoother learning of robust features. This is combined with a contrastive regularization objective that promotes separability of ID and OOD samples in a chosen score space, such as Mahalanobis or energy-based. Experiments demonstrate that GCOS outperforms state-of-the-art methods using standard energy-based inference on near-OOD benchmarks, defined as tasks where outliers share the same semantic domain as in-distribution data. As an exploratory extension, the framework naturally transitions to conformal OOD inference, which translates uncertainty scores into statistically valid p-values and enables thresholds with formal error guarantees, providing a pathway toward more predictable and reliable OOD detection.
AIApr 25, 2022
Integrating Prior Knowledge in Post-hoc ExplanationsAdulam Jeyasothy, Thibault Laugel, Marie-Jeanne Lesot et al.
In the field of eXplainable Artificial Intelligence (XAI), post-hoc interpretability methods aim at explaining to a user the predictions of a trained decision model. Integrating prior knowledge into such interpretability methods aims at improving the explanation understandability and allowing for personalised explanations adapted to each user. In this paper, we propose to define a cost function that explicitly integrates prior knowledge into the interpretability objectives: we present a general framework for the optimization problem of post-hoc interpretability methods, and show that user knowledge can thus be integrated to any method by adding a compatibility term in the cost function. We instantiate the proposed formalization in the case of counterfactual explanations and propose a new interpretability method called Knowledge Integration in Counterfactual Explanation (KICE) to optimize it. The paper performs an experimental study on several benchmark data sets to characterize the counterfactual instances generated by KICE, as compared to reference methods.
LGOct 27, 2023
On the Fairness ROAD: Robust Optimization for Adversarial DebiasingVincent Grari, Thibault Laugel, Tatsunori Hashimoto et al.
In the field of algorithmic fairness, significant attention has been put on group fairness criteria, such as Demographic Parity and Equalized Odds. Nevertheless, these objectives, measured as global averages, have raised concerns about persistent local disparities between sensitive groups. In this work, we address the problem of local fairness, which ensures that the predictor is unbiased not only in terms of expectations over the whole population, but also within any subregion of the feature space, unknown at training time. To enforce this objective, we introduce ROAD, a novel approach that leverages the Distributionally Robust Optimization (DRO) framework within a fair adversarial learning objective, where an adversary tries to infer the sensitive attribute from the predictions. Using an instance-level re-weighting strategy, ROAD is designed to prioritize inputs that are likely to be locally unfair, i.e. where the adversary faces the least difficulty in reconstructing the sensitive attribute. Numerical experiments demonstrate the effectiveness of our method: it achieves Pareto dominance with respect to local fairness and accuracy for a given global fairness level across three standard datasets, and also enhances fairness generalization under distribution shift.
LGAug 27, 2024
Post-processing fairness with minimal changesFederico Di Gennaro, Thibault Laugel, Vincent Grari et al.
In this paper, we introduce a novel post-processing algorithm that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce minimal changes between biased and debiased predictions; a property that, while highly desirable, is rarely prioritized as an explicit objective in fairness literature. Our approach leverages a multiplicative factor applied to the logit value of probability scores produced by a black-box classifier. We demonstrate the efficacy of our method through empirical evaluations, comparing its performance against other four debiasing algorithms on two widely used datasets in fairness research.
CLMar 25
Alignment Reduces Expressed but Not Encoded Gender Bias: A Unified Framework and StudyNour Bouchouchi, Thiabult Laugel, Xavier Renard et al.
During training, Large Language Models (LLMs) learn social regularities that can lead to gender bias in downstream applications. Most mitigation efforts focus on reducing bias in generated outputs, typically evaluated on structured benchmarks, which raises two concerns: output-level evaluation does not reveal whether alignment modifies the model's underlying representations, and structured benchmarks may not reflect realistic usage scenarios. We propose a unified framework to jointly analyze intrinsic and extrinsic gender bias in LLMs using identical neutral prompts, enabling direct comparison between gender-related information encoded in internal representations and bias expressed in generated outputs. Contrary to prior work reporting weak or inconsistent correlations, we find a consistent association between latent gender information and expressed bias when measured under the unified protocol. We further examine the effect of alignment through supervised fine-tuning aimed at reducing gender bias. Our results suggest that while the latter indeed reduces expressed bias, measurable gender-related associations are still present in internal representations, and can be reactivated under adversarial prompting. Finally, we consider two realistic settings and show that debiasing effects observed on structured benchmarks do not necessarily generalize, e.g., to the case of story generation.
LGFeb 14, 2023
When mitigating bias is unfair: multiplicity and arbitrariness in algorithmic group fairnessNatasa Krco, Thibault Laugel, Vincent Grari et al.
Most research on fair machine learning has prioritized optimizing criteria such as Demographic Parity and Equalized Odds. Despite these efforts, there remains a limited understanding of how different bias mitigation strategies affect individual predictions and whether they introduce arbitrariness into the debiasing process. This paper addresses these gaps by exploring whether models that achieve comparable fairness and accuracy metrics impact the same individuals and mitigate bias in a consistent manner. We introduce the FRAME (FaiRness Arbitrariness and Multiplicity Evaluation) framework, which evaluates bias mitigation through five dimensions: Impact Size (how many people were affected), Change Direction (positive versus negative changes), Decision Rates (impact on models' acceptance rates), Affected Subpopulations (who was affected), and Neglected Subpopulations (where unfairness persists). This framework is intended to help practitioners understand the impacts of debiasing processes and make better-informed decisions regarding model selection. Applying FRAME to various bias mitigation approaches across key datasets allows us to exhibit significant differences in the behaviors of debiasing methods. These findings highlight the limitations of current fairness criteria and the inherent arbitrariness in the debiasing process.
CYApr 24
From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement AttitudesRubén Garzón, Pauline Baron, Vincent Grari et al.
Large language models (LLM) agents may offer tools to predict human responses to surveys. A common technique for defining these agents uses only demographics, for example country, age, gender, employment status, income, education and marital status. We compare the predictive accuracy of demographic agents to that of survey agents defined with a larger set of in-domain survey responses. We test both approaches in predicting responses to the multidisciplinary, cross-national Survey of Health, Ageing and Retirement in Europe (SHARE), focusing on five variables from three policy-relevant constructs around personal finance. In these three constructs, we observe that, compared to survey agents trained on broader data, demographics-only agents (1) exhibited a central tendency bias, skewing answers toward population means, and (2) were unrealistically accurate, failing to reproduce the incorrect answers and "don't know" responses typical of human respondents. These performance differences are further substantiated through the replication of a hierarchical regression analysis from prior retirement planning research. Agents based solely on demographic information reproduce the outcome that financial risk tolerance, future time perspective, and knowledge of retirement planning each are predictive of retirement savings. However, only the survey-anchored agents succeed in reproducing the interaction among these three factors. These findings suggest caution in using only demographics to define LLM agents for predicting survey responses.
LGMay 22, 2024
Why do explanations fail? A typology and discussion on failures in XAIClara Bove, Thibault Laugel, Marie-Jeanne Lesot et al.
As Machine Learning models achieve unprecedented levels of performance, the XAI domain aims at making these models understandable by presenting end-users with intelligible explanations. Yet, some existing XAI approaches fail to meet expectations: several issues have been reported in the literature, generally pointing out either technical limitations or misinterpretations by users. In this paper, we argue that the resulting harms arise from a complex overlap of multiple failures in XAI, which existing ad-hoc studies fail to capture. This work therefore advocates for a holistic perspective, presenting a systematic investigation of limitations of current XAI methods and their impact on the interpretation of explanations. % By distinguishing between system-specific and user-specific failures, we propose a typological framework that helps revealing the nuanced complexities of explanation failures. Leveraging this typology, we discuss some research directions to help practitioners better understand the limitations of XAI systems and enhance the quality of ML explanations.
AIMar 3, 2025
SAKE: Steering Activations for Knowledge EditingMarco Scialanga, Thibault Laugel, Vincent Grari et al.
As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.
CLFeb 20
Agentic Adversarial QA for Improving Domain-Specific LLMsVincent Grari, Ciprian Tomoiaga, Sylvain Lamprier et al.
Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these models for such domains; however, progress is constrained by the scarcity and limited coverage of high-quality, task-relevant data. To address this, synthetic data generation methods such as paraphrasing or knowledge extraction are commonly applied. Although these approaches excel at factual recall and conceptual knowledge, they suffer from two critical shortcomings: (i) they provide minimal support for interpretive reasoning capabilities in these specialized domains, and (ii) they often produce synthetic corpora that are excessively large and redundant, resulting in poor sample efficiency. To overcome these gaps, we propose an adversarial question-generation framework that produces a compact set of semantically challenging questions. These questions are constructed by comparing the outputs of the model to be adapted and a robust expert model grounded in reference documents, using an iterative, feedback-driven process designed to reveal and address comprehension gaps. Evaluation on specialized subsets of the LegalBench corpus demonstrates that our method achieves greater accuracy with substantially fewer synthetic samples.
LGNov 21, 2025
Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image GenerationAniketh Iyengar, Jiaqi Han, Boris Ruf et al.
The rapidly growing computational demands of diffusion models for image generation have raised significant concerns about energy consumption and environmental impact. While existing approaches to energy optimization focus on architectural improvements or hardware acceleration, there is a lack of principled methods to predict energy consumption across different model configurations and hardware setups. We propose an adaptation of Kaplan scaling laws to predict GPU energy consumption for diffusion models based on computational complexity (FLOPs). Our approach decomposes diffusion model inference into text encoding, iterative denoising, and decoding components, with the hypothesis that denoising operations dominate energy consumption due to their repeated execution across multiple inference steps. We conduct comprehensive experiments across four state-of-the-art diffusion models (Stable Diffusion 2, Stable Diffusion 3.5, Flux, and Qwen) on three GPU architectures (NVIDIA A100, A4000, A6000), spanning various inference configurations including resolution (256x256 to 1024x1024), precision (fp16/fp32), step counts (10-50), and classifier-free guidance settings. Our energy scaling law achieves high predictive accuracy within individual architectures (R-squared > 0.9) and exhibits strong cross-architecture generalization, maintaining high rank correlations across models and enabling reliable energy estimation for unseen model-hardware combinations. These results validate the compute-bound nature of diffusion inference and provide a foundation for sustainable AI deployment planning and carbon footprint estimation.
LGSep 30, 2025
ACT: Agentic Classification TreeVincent Grari, Tim Arni, Thibault Laugel et al.
When used in high-stakes settings, AI systems are expected to produce decisions that are transparent, interpretable, and auditable, a requirement increasingly expected by regulations. Decision trees such as CART provide clear and verifiable rules, but they are restricted to structured tabular data and cannot operate directly on unstructured inputs such as text. In practice, large language models (LLMs) are widely used for such data, yet prompting strategies such as chain-of-thought or prompt optimization still rely on free-form reasoning, limiting their ability to ensure trustworthy behaviors. We present the Agentic Classification Tree (ACT), which extends decision-tree methodology to unstructured inputs by formulating each split as a natural-language question, refined through impurity-based evaluation and LLM feedback via TextGrad. Experiments on text benchmarks show that ACT matches or surpasses prompting-based baselines while producing transparent and interpretable decision paths.
AIJul 21, 2025
Metric assessment protocol in the context of answer fluctuation on MCQ tasksEkaterina Goliakova, Xavier Renard, Marie-Jeanne Lesot et al.
Using multiple-choice questions (MCQs) has become a standard for assessing LLM capabilities efficiently. A variety of metrics can be employed for this task. However, previous research has not conducted a thorough assessment of them. At the same time, MCQ evaluation suffers from answer fluctuation: models produce different results given slight changes in prompts. We suggest a metric assessment protocol in which evaluation methodologies are analyzed through their connection with fluctuation rates, as well as original performance. Our results show that there is a strong link between existing metrics and the answer changing, even when computed without any additional prompt variants. A novel metric, worst accuracy, demonstrates the highest association on the protocol.
LGFeb 28, 2025
Controlled Model Debiasing through Minimal and Interpretable UpdatesFederico Di Gennaro, Thibault Laugel, Vincent Grari et al.
Traditional approaches to learning fair machine learning models often require rebuilding models from scratch, typically without considering potentially existing models. In a context where models need to be retrained frequently, this can lead to inconsistent model updates, as well as redundant and costly validation testing. To address this limitation, we introduce the notion of controlled model debiasing, a novel supervised learning task relying on two desiderata: that the differences between the new fair model and the existing one should be (i) minimal and (ii) interpretable. After providing theoretical guarantees to this new problem, we introduce a novel algorithm for algorithmic fairness, COMMOD, that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce minimal and interpretable changes between biased and debiased predictions in a binary classification task, a property that, while highly desirable in high-stakes applications, is rarely prioritized as an explicit objective in fairness literature. Our approach combines a concept-based architecture and adversarial learning and we demonstrate through empirical results that it achieves comparable performance to state-of-the-art debiasing methods while performing minimal and interpretable prediction changes.
PMFeb 4, 2025
Regret-Optimized Portfolio Enhancement through Deep Reinforcement Learning and Future Looking RewardsDaniil Karzanov, Rubén Garzón, Mikhail Terekhov et al.
This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO). Rather than focusing solely on traditional portfolio construction, our approach aims to improve an already high-performing strategy through dynamic rebalancing driven by PPO and Oracle agents. Our target is to enhance the traditional 60/40 benchmark (60% stocks, 40% bonds) by employing the Regret-based Sharpe reward function. To address the impact of transaction fee frictions and prevent signal loss, we develop a transaction cost scheduler. We introduce a future-looking reward function and employ synthetic data training through a circular block bootstrap method to facilitate the learning of generalizable allocation strategies. We focus on two key evaluation measures: return and maximum drawdown. Given the high stochasticity of financial markets, we train 20 independent agents each period and evaluate their average performance against the benchmark. Our method not only enhances the performance of the existing portfolio strategy through strategic rebalancing but also demonstrates strong results compared to other baselines.
LGApr 16, 2024
OptiGrad: A Fair and more Efficient Price Elasticity Optimization via a Gradient Based LearningVincent Grari, Marcin Detyniecki
This paper presents a novel approach to optimizing profit margins in non-life insurance markets through a gradient descent-based method, targeting three key objectives: 1) maximizing profit margins, 2) ensuring conversion rates, and 3) enforcing fairness criteria such as demographic parity (DP). Traditional pricing optimization, which heavily lean on linear and semi definite programming, encounter challenges in balancing profitability and fairness. These challenges become especially pronounced in situations that necessitate continuous rate adjustments and the incorporation of fairness criteria. Specifically, indirect Ratebook optimization, a widely-used method for new business price setting, relies on predictor models such as XGBoost or GLMs/GAMs to estimate on downstream individually optimized prices. However, this strategy is prone to sequential errors and struggles to effectively manage optimizations for continuous rate scenarios. In practice, to save time actuaries frequently opt for optimization within discrete intervals (e.g., range of [-20\%, +20\%] with fix increments) leading to approximate estimations. Moreover, to circumvent infeasible solutions they often use relaxed constraints leading to suboptimal pricing strategies. The reverse-engineered nature of traditional models complicates the enforcement of fairness and can lead to biased outcomes. Our method addresses these challenges by employing a direct optimization strategy in the continuous space of rates and by embedding fairness through an adversarial predictor model. This innovation not only reduces sequential errors and simplifies the complexities found in traditional models but also directly integrates fairness measures into the commercial premium calculation. We demonstrate improved margin performance and stronger enforcement of fairness highlighting the critical need to evolve existing pricing strategies.
AIMay 10, 2023
Achieving Diversity in Counterfactual Explanations: a Review and DiscussionThibault Laugel, Adulam Jeyasothy, Marie-Jeanne Lesot et al.
In the field of Explainable Artificial Intelligence (XAI), counterfactual examples explain to a user the predictions of a trained decision model by indicating the modifications to be made to the instance so as to change its associated prediction. These counterfactual examples are generally defined as solutions to an optimization problem whose cost function combines several criteria that quantify desiderata for a good explanation meeting user needs. A large variety of such appropriate properties can be considered, as the user needs are generally unknown and differ from one user to another; their selection and formalization is difficult. To circumvent this issue, several approaches propose to generate, rather than a single one, a set of diverse counterfactual examples to explain a prediction. This paper proposes a review of the numerous, sometimes conflicting, definitions that have been proposed for this notion of diversity. It discusses their underlying principles as well as the hypotheses on the user needs they rely on and proposes to categorize them along several dimensions (explicit vs implicit, universe in which they are defined, level at which they apply), leading to the identification of further research challenges on this topic.
MLFeb 24, 2022
A Fair Pricing Model via Adversarial LearningVincent Grari, Arthur Charpentier, Marcin Detyniecki
At the core of insurance business lies classification between risky and non-risky insureds, actuarial fairness meaning that risky insureds should contribute more and pay a higher premium than non-risky or less-risky ones. Actuaries, therefore, use econometric or machine learning techniques to classify, but the distinction between a fair actuarial classification and "discrimination" is subtle. For this reason, there is a growing interest about fairness and discrimination in the actuarial community Lindholm, Richman, Tsanakas, and Wuthrich (2022). Presumably, non-sensitive characteristics can serve as substitutes or proxies for protected attributes. For example, the color and model of a car, combined with the driver's occupation, may lead to an undesirable gender bias in the prediction of car insurance prices. Surprisingly, we will show that debiasing the predictor alone may be insufficient to maintain adequate accuracy (1). Indeed, the traditional pricing model is currently built in a two-stage structure that considers many potentially biased components such as car or geographic risks. We will show that this traditional structure has significant limitations in achieving fairness. For this reason, we have developed a novel pricing model approach. Recently some approaches have Blier-Wong, Cossette, Lamontagne, and Marceau (2021); Wuthrich and Merz (2021) shown the value of autoencoders in pricing. In this paper, we will show that (2) this can be generalized to multiple pricing factors (geographic, car type), (3) it perfectly adapted for a fairness context (since it allows to debias the set of pricing components): We extend this main idea to a general framework in which a single whole pricing model is trained by generating the geographic and car pricing components needed to predict the pure premium while mitigating the unwanted bias according to the desired metric.
LGSep 10, 2021
Fairness without the sensitive attribute via Causal Variational AutoencoderVincent Grari, Sylvain Lamprier, Marcin Detyniecki
In recent years, most fairness strategies in machine learning models focus on mitigating unwanted biases by assuming that the sensitive information is observed. However this is not always possible in practice. Due to privacy purposes and var-ious regulations such as RGPD in EU, many personal sensitive attributes are frequently not collected. We notice a lack of approaches for mitigating bias in such difficult settings, in particular for achieving classical fairness objectives such as Demographic Parity and Equalized Odds. By leveraging recent developments for approximate inference, we propose an approach to fill this gap. Based on a causal graph, we rely on a new variational auto-encoding based framework named SRCVAE to infer a sensitive information proxy, that serve for bias mitigation in an adversarial fairness approach. We empirically demonstrate significant improvements over existing works in the field. We observe that the generated proxy's latent space recovers sensitive information and that our approach achieves a higher accuracy while obtaining the same level of fairness on two real datasets, as measured using com-mon fairness definitions.
LGJul 9, 2021
How to choose an Explainability Method? Towards a Methodical Implementation of XAI in PracticeTom Vermeire, Thibault Laugel, Xavier Renard et al.
Explainability is becoming an important requirement for organizations that make use of automated decision-making due to regulatory initiatives and a shift in public awareness. Various and significantly different algorithmic methods to provide this explainability have been introduced in the field, but the existing literature in the machine learning community has paid little attention to the stakeholder whose needs are rather studied in the human-computer interface community. Therefore, organizations that want or need to provide this explainability are confronted with the selection of an appropriate method for their use case. In this paper, we argue there is a need for a methodology to bridge the gap between stakeholder needs and explanation methods. We present our ongoing work on creating this methodology to help data scientists in the process of providing explainability to stakeholders. In particular, our contributions include documents used to characterize XAI methods and user requirements (shown in Appendix), which our methodology builds upon.
LGJul 9, 2021
Understanding surrogate explanations: the interplay between complexity, fidelity and coverageRafael Poyiadzi, Xavier Renard, Thibault Laugel et al.
This paper analyses the fundamental ingredients behind surrogate explanations to provide a better understanding of their inner workings. We start our exposition by considering global surrogates, describing the trade-off between complexity of the surrogate and fidelity to the black-box being modelled. We show that transitioning from global to local - reducing coverage - allows for more favourable conditions on the Pareto frontier of fidelity-complexity of a surrogate. We discuss the interplay between complexity, fidelity and coverage, and consider how different user needs can lead to problem formulations where these are either constraints or penalties. We also present experiments that demonstrate how the local surrogate interpretability procedure can be made interactive and lead to better explanations.
LGJun 10, 2021
On the overlooked issue of defining explanation objectives for local-surrogate explainersRafael Poyiadzi, Xavier Renard, Thibault Laugel et al.
Local surrogate approaches for explaining machine learning model predictions have appealing properties, such as being model-agnostic and flexible in their modelling. Several methods exist that fit this description and share this goal. However, despite their shared overall procedure, they set out different objectives, extract different information from the black-box, and consequently produce diverse explanations, that are -- in general -- incomparable. In this work we review the similarities and differences amongst multiple methods, with a particular focus on what information they extract from the model, as this has large impact on the output: the explanation. We discuss the implications of the lack of agreement, and clarity, amongst the methods' objectives on the research and practice of explainability.
AIMay 3, 2021
Explaining how your AI system is fairBoris Ruf, Marcin Detyniecki
To implement fair machine learning in a sustainable way, choosing the right fairness objective is key. Since fairness is a concept of justice which comes in various, sometimes conflicting definitions, this is not a trivial task though. The most appropriate fairness definition for an artificial intelligence (AI) system is a matter of ethical standards and legal requirements, and the right choice depends on the particular use case and its context. In this position paper, we propose to use a decision tree as means to explain and justify the implemented kind of fairness to the end users. Such a structure would first of all support AI practitioners in mapping ethical principles to fairness definitions for a concrete application and therefore make the selection a straightforward and transparent process. However, this approach would also help document the reasoning behind the decision making. Due to the general complexity of the topic of fairness in AI, we argue that specifying "fairness" for a given use case is the best way forward to maintain confidence in AI systems. In this case, this could be achieved by sharing the reasons and principles expressed during the decision making process with the broader audience.
LGApr 12, 2021
Understanding Prediction Discrepancies in Machine Learning ClassifiersXavier Renard, Thibault Laugel, Marcin Detyniecki
A multitude of classifiers can be trained on the same data to achieve similar performances during test time, while having learned significantly different classification patterns. This phenomenon, which we call prediction discrepancies, is often associated with the blind selection of one model instead of another with similar performances. When making a choice, the machine learning practitioner has no understanding on the differences between models, their limits, where they agree and where they don't. But his/her choice will result in concrete consequences for instances to be classified in the discrepancy zone, since the final decision will be based on the selected classification pattern. Besides the arbitrary nature of the result, a bad choice could have further negative consequences such as loss of opportunity or lack of fairness. This paper proposes to address this question by analyzing the prediction discrepancies in a pool of best-performing models trained on the same data. A model-agnostic algorithm, DIG, is proposed to capture and explain discrepancies locally, to enable the practitioner to make the best educated decision when selecting a model by anticipating its potential undesired consequences. All the code to reproduce the experiments is available.
LGApr 9, 2021
Implementing Fair Regression In The Real WorldBoris Ruf, Marcin Detyniecki
Most fair regression algorithms mitigate bias towards sensitive sub populations and therefore improve fairness at group level. In this paper, we investigate the impact of such implementation of fair regression on the individual. More precisely, we assess the evolution of continuous predictions from an unconstrained to a fair algorithm by comparing results from baseline algorithms with fair regression algorithms for the same data points. Based on our findings, we propose a set of post-processing algorithms to improve the utility of the existing fair regression approaches.
AIFeb 16, 2021
Towards the Right Kind of Fairness in AIBoris Ruf, Marcin Detyniecki
Fairness is a concept of justice. Various definitions exist, some of them conflicting with each other. In the absence of an uniformly accepted notion of fairness, choosing the right kind for a specific situation has always been a central issue in human history. When it comes to implementing sustainable fairness in artificial intelligence systems, this old question plays a key role once again: How to identify the most appropriate fairness metric for a particular application? The answer is often a matter of context, and the best choice depends on ethical standards and legal requirements. Since ethics guidelines on this topic are kept rather general for now, we aim to provide more hands-on guidance with this document. Therefore, we first structure the complex landscape of existing fairness metrics and explain the different options by example. Furthermore, we propose the "Fairness Compass", a tool which formalises the selection process and makes identifying the most appropriate fairness definition for a given system a simple, straightforward procedure. Because this process also allows to document the reasoning behind the respective decisions, we argue that this approach can help to build trust from the user through explaining and justifying the implemented fairness.
CLDec 24, 2020
QUACKIE: A NLP Classification Task With Ground Truth ExplanationsYves Rychener, Xavier Renard, Djamé Seddah et al.
NLP Interpretability aims to increase trust in model predictions. This makes evaluating interpretability approaches a pressing issue. There are multiple datasets for evaluating NLP Interpretability, but their dependence on human provided ground truths raises questions about their unbiasedness. In this work, we take a different approach and formulate a specific classification task by diverting question-answering datasets. For this custom classification task, the interpretability ground-truth arises directly from the definition of the classification problem. We use this method to propose a benchmark and lay the groundwork for future research in NLP interpretability by evaluating a wide range of current state of the art methods.
CLDec 24, 2020
On the Granularity of Explanations in Model Agnostic NLP InterpretabilityYves Rychener, Xavier Renard, Djamé Seddah et al.
Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response. In this paper, we outline limitations of this approach when using complex BERT-based classifiers: The word-based sampling produces texts that are out-of-distribution for the classifier and further gives rise to a high-dimensional search space, which can't be sufficiently explored when time or computation power is limited. Both of these challenges can be addressed by using segments as elementary building blocks for NLP interpretability. As illustration, we show that the simple choice of sentences greatly improves on both of these challenges. As a consequence, the resulting explainer attains much better fidelity on a benchmark classification task.
AISep 14, 2020
Active Fairness Instead of UnawarenessBoris Ruf, Marcin Detyniecki
The possible risk that AI systems could promote discrimination by reproducing and enforcing unwanted bias in data has been broadly discussed in research and society. Many current legal standards demand to remove sensitive attributes from data in order to achieve "fairness through unawareness". We argue that this approach is obsolete in the era of big data where large datasets with highly correlated attributes are common. In the contrary, we propose the active use of sensitive attributes with the purpose of observing and controlling any kind of discrimination, and thus leading to fair results.
LGSep 7, 2020
Learning Unbiased Representations via Rényi MinimizationVincent Grari, Oualid El Hajouji, Sylvain Lamprier et al.
In recent years, significant work has been done to include fairness constraints in the training objective of machine learning algorithms. Many state-of the-art algorithms tackle this challenge by learning a fair representation which captures all the relevant information to predict the output Y while not containing any information about a sensitive attribute S. In this paper, we propose an adversarial algorithm to learn unbiased representations via the Hirschfeld-Gebelein-Renyi (HGR) maximal correlation coefficient. We leverage recent work which has been done to estimate this coefficient by learning deep neural network transformations and use it as a minmax game to penalize the intrinsic bias in a multi dimensional latent representation. Compared to other dependence measures, the HGR coefficient captures more information about the non-linear dependencies with the sensitive variable, making the algorithm more efficient in mitigating bias in the representation. We empirically evaluate and compare our approach and demonstrate significant improvements over existing works in the field.
LGAug 30, 2020
Adversarial Learning for Counterfactual FairnessVincent Grari, Sylvain Lamprier, Marcin Detyniecki
In recent years, fairness has become an important topic in the machine learning research community. In particular, counterfactual fairness aims at building prediction models which ensure fairness at the most individual level. Rather than globally considering equity over the entire population, the idea is to imagine what any individual would look like with a variation of a given attribute of interest, such as a different gender or race for instance. Existing approaches rely on Variational Auto-encoding of individuals, using Maximum Mean Discrepancy (MMD) penalization to limit the statistical dependence of inferred representations with their corresponding sensitive attributes. This enables the simulation of counterfactual samples used for training the target fair model, the goal being to produce similar outcomes for every alternate version of any individual. In this work, we propose to rely on an adversarial neural learning approach, that enables more powerful inference than with MMD penalties, and is particularly better fitted for the continuous setting, where values of sensitive attributes cannot be exhaustively enumerated. Experiments show significant improvements in term of counterfactual fairness for both the discrete and the continuous settings.
SPAug 27, 2020
Vehicle Telematics Via Exteroceptive Sensors: A SurveyFernando Molano Ortiz, Matteo Sammarco, Luís Henrique M. K. Costa et al.
Whereas a very large number of sensors are available in the automotive field, currently just a few of them, mostly proprioceptive ones, are used in telematics, automotive insurance, and mobility safety research. In this paper, we show that exteroceptive sensors, like microphones or cameras, could replace proprioceptive ones in many fields. Our main motivation is to provide the reader with alternative ideas for the development of telematics applications when proprioceptive sensors are unusable for technological issues, privacy concerns, or lack of availability in commercial devices. We first introduce a taxonomy of sensors in telematics. Then, we review in detail all exteroceptive sensors of some interest for vehicle telematics, highlighting advantages, drawbacks, and availability in off-the-shelf devices. Successively, we present a list of notable telematics services and applications in research and industry like driving profiling or vehicular safety. For each of them, we report the most recent and important works relying on exteroceptive sensors, as long as the available datasets. We conclude showing open challenges using exteroceptive sensors both for industry and research.
AIMar 15, 2020
Getting Fairness Right: Towards a Toolbox for PractitionersBoris Ruf, Chaouki Boutharouite, Marcin Detyniecki
The potential risk of AI systems unintentionally embedding and reproducing bias has attracted the attention of machine learning practitioners and society at large. As policy makers are willing to set the standards of algorithms and AI techniques, the issue on how to refine existing regulation, in order to enforce that decisions made by automated systems are fair and non-discriminatory, is again critical. Meanwhile, researchers have demonstrated that the various existing metrics for fairness are statistically mutually exclusive and the right choice mostly depends on the use case and the definition of fairness. Recognizing that the solutions for implementing fair AI are not purely mathematical but require the commitments of the stakeholders to define the desired nature of fairness, this paper proposes to draft a toolbox which helps practitioners to ensure fair AI practices. Based on the nature of the application and the available training data, but also on legal requirements and ethical, philosophical and cultural dimensions, the toolbox aims to identify the most appropriate fairness objective. This approach attempts to structure the complex landscape of fairness metrics and, therefore, makes the different available options more accessible to non-technical people. In the proven absence of a silver bullet solution for fair AI, this toolbox intends to produce the fairest AI systems possible with respect to their local context.
LGNov 13, 2019
Fair Adversarial Gradient Tree BoostingVincent Grari, Boris Ruf, Sylvain Lamprier et al.
Fair classification has become an important topic in machine learning research. While most bias mitigation strategies focus on neural networks, we noticed a lack of work on fair classifiers based on decision trees even though they have proven very efficient. In an up-to-date comparison of state-of-the-art classification algorithms in tabular data, tree boosting outperforms deep learning. For this reason, we have developed a novel approach of adversarial gradient tree boosting. The objective of the algorithm is to predict the output $Y$ with gradient tree boosting while minimizing the ability of an adversarial neural network to predict the sensitive attribute $S$. The approach incorporates at each iteration the gradient of the neural network directly in the gradient tree boosting. We empirically assess our approach on 4 popular data sets and compare against state-of-the-art algorithms. The results show that our algorithm achieves a higher accuracy while obtaining the same level of fairness, as measured using a set of different common fairness definitions.
LGNov 12, 2019
Fairness-Aware Neural Réyni Minimization for Continuous FeaturesVincent Grari, Boris Ruf, Sylvain Lamprier et al.
The past few years have seen a dramatic rise of academic and societal interest in fair machine learning. While plenty of fair algorithms have been proposed recently to tackle this challenge for discrete variables, only a few ideas exist for continuous ones. The objective in this paper is to ensure some independence level between the outputs of regression models and any given continuous sensitive variables. For this purpose, we use the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation coefficient as a fairness metric. We propose two approaches to minimize the HGR coefficient. First, by reducing an upper bound of the HGR with a neural network estimation of the $χ^{2}$ divergence. Second, by minimizing the HGR directly with an adversarial neural network architecture. The idea is to predict the output Y while minimizing the ability of an adversarial neural network to find the estimated transformations which are required to predict the HGR coefficient. We empirically assess and compare our approaches and demonstrate significant improvements on previously presented work in the field.
MLNov 8, 2019
Imperceptible Adversarial Attacks on Tabular DataVincent Ballet, Xavier Renard, Jonathan Aigrain et al.
Security of machine learning models is a concern as they may face adversarial attacks for unwarranted advantageous decisions. While research on the topic has mainly been focusing on the image domain, numerous industrial applications, in particular in finance, rely on standard tabular data. In this paper, we discuss the notion of adversarial examples in the tabular domain. We propose a formalization based on the imperceptibility of attacks in the tabular domain leading to an approach to generate imperceptible adversarial examples. Experiments show that we can generate imperceptible adversarial examples with a high fooling rate.
AIOct 10, 2019
Contract Statements Knowledge Service for ChatbotsBoris Ruf, Matteo Sammarco, Marcin Detyniecki
Towards conversational agents that are capable of handling more complex questions on contractual conditions, formalizing contract statements in a machine readable way is crucial. However, constructing a formal model which captures the full scope of a contract proves difficult due to the overall complexity its set of rules represent. Instead, this paper presents a top-down approach to the problem. After identifying the most relevant contract statements, we model their underlying rules in a novel knowledge engineering method. A user-friendly tool we developed for this purpose allows to do so easily and at scale. Then, we expose the statements as service so they can get smoothly integrated in any chatbot framework.
LGJul 22, 2019
The Dangers of Post-hoc Interpretability: Unjustified Counterfactual ExplanationsThibault Laugel, Marie-Jeanne Lesot, Christophe Marsala et al.
Post-hoc interpretability approaches have been proven to be powerful tools to generate explanations for the predictions made by a trained black-box model. However, they create the risk of having explanations that are a result of some artifacts learned by the model instead of actual knowledge from the data. This paper focuses on the case of counterfactual explanations and asks whether the generated instances can be justified, i.e. continuously connected to some ground-truth data. We evaluate the risk of generating unjustified counterfactual examples by investigating the local neighborhoods of instances whose predictions are to be explained and show that this risk is quite high for several datasets. Furthermore, we show that most state of the art approaches do not differentiate justified from unjustified counterfactual examples, leading to less useful explanations.
LGJun 11, 2019
Issues with post-hoc counterfactual explanations: a discussionThibault Laugel, Marie-Jeanne Lesot, Christophe Marsala et al.
Counterfactual post-hoc interpretability approaches have been proven to be useful tools to generate explanations for the predictions of a trained blackbox classifier. However, the assumptions they make about the data and the classifier make them unreliable in many contexts. In this paper, we discuss three desirable properties and approaches to quantify them: proximity, connectedness and stability. In addition, we illustrate that there is a risk for post-hoc counterfactual approaches to not satisfy these properties.
MLJun 4, 2019
Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision TreesXavier Renard, Nicolas Woloszko, Jonathan Aigrain et al.
Interpretable surrogates of black-box predictors trained on high-dimensional tabular datasets can struggle to generate comprehensible explanations in the presence of correlated variables. We propose a model-agnostic interpretable surrogate that provides global and local explanations of black-box classifiers to address this issue. We introduce the idea of concepts as intuitive groupings of variables that are either defined by a domain expert or automatically discovered using correlation coefficients. Concepts are embedded in a surrogate decision tree to enhance its comprehensibility. First experiments on FRED-MD, a macroeconomic database with 134 variables, show improvement in human-interpretability while accuracy and fidelity of the surrogate model are preserved.
LGMay 22, 2019
Detecting Adversarial Examples and Other Misclassifications in Neural Networks by IntrospectionJonathan Aigrain, Marcin Detyniecki
Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level.
MLSep 7, 2018
Detecting Potential Local Adversarial Examples for Human-Interpretable DefenseXavier Renard, Thibault Laugel, Marie-Jeanne Lesot et al.
Machine learning models are increasingly used in the industry to make decisions such as credit insurance approval. Some people may be tempted to manipulate specific variables, such as the age or the salary, in order to get better chances of approval. In this ongoing work, we propose to discuss, with a first proposition, the issue of detecting a potential local adversarial example on classical tabular data by providing to a human expert the locally critical features for the classifier's decision, in order to control the provided information and avoid a fraud.
LGJun 19, 2018
Defining Locality for Surrogates in Post-hoc InterpretablityThibault Laugel, Xavier Renard, Marie-Jeanne Lesot et al.
Local surrogate models, to approximate the local decision boundary of a black-box classifier, constitute one approach to generate explanations for the rationale behind an individual prediction made by the back-box. This paper highlights the importance of defining the right locality, the neighborhood on which a local surrogate is trained, in order to approximate accurately the local black-box decision boundary. Unfortunately, as shown in this paper, this issue is not only a parameter or sampling distribution challenge and has a major impact on the relevance and quality of the approximation of the local black-box decision boundary and thus on the meaning and accuracy of the generated explanation. To overcome the identified problems, quantified with an adapted measure and procedure, we propose to generate surrogate-based explanations for individual predictions based on a sampling centered on particular place of the decision boundary, relevant for the prediction to be explained, rather than on the prediction itself as it is classically done. We evaluate the novel approach compared to state-of-the-art methods and a straightforward improvement thereof on four UCI datasets.
MLDec 22, 2017
Inverse Classification for Comparison-based Interpretability in Machine LearningThibault Laugel, Marie-Jeanne Lesot, Christophe Marsala et al.
In the context of post-hoc interpretability, this paper addresses the task of explaining the prediction of a classifier, considering the case where no information is available, neither on the classifier itself, nor on the processed data (neither the training nor the test data). It proposes an instance-based approach whose principle consists in determining the minimal changes needed to alter a prediction: given a data point whose classification must be explained, the proposed method consists in identifying a close neighbour classified differently, where the closeness definition integrates a sparsity constraint. This principle is implemented using observation generation in the Growing Spheres algorithm. Experimental results on two datasets illustrate the relevance of the proposed approach that can be used to gain knowledge about the classifier.