Laura Arnal

LGJan 12

Explaining Machine Learning Predictive Models through Conditional Expectation Methods

Silvia Ruiz-España, Laura Arnal, François Signol et al.

The rapid adoption of complex Artificial Intelligence (AI) and Machine Learning (ML) models has led to their characterization as black boxes due to the difficulty of explaining their internal decision-making processes. This lack of transparency hinders users' ability to understand, validate and trust model behavior, particularly in high-risk applications. Although explainable AI (XAI) has made significant progress, there remains a need for versatile and effective techniques to address increasingly complex models. This work introduces Multivariate Conditional Expectation (MUCE), a model-agnostic method for local explainability designed to capture prediction changes from feature interactions. MUCE extends Individual Conditional Expectation (ICE) by exploring a multivariate grid of values in the neighborhood of a given observation at inference time, providing graphical explanations that illustrate the local evolution of model predictions. In addition, two quantitative indices, stability and uncertainty, summarize local behavior and assess model reliability. Uncertainty is further decomposed into uncertainty+ and uncertainty- to capture asymmetric effects that global measures may overlook. The proposed method is validated using XGBoost models trained on three datasets: two synthetic (2D and 3D) to evaluate behavior near decision boundaries, and one transformed real-world dataset to test adaptability to heterogeneous feature types. Results show that MUCE effectively captures complex local model behavior, while the stability and uncertainty indices provide meaningful insight into prediction confidence. MUCE, together with the ICE modification and the proposed indices, offers a practical contribution to local explainability, enabling both graphical and quantitative insights that enhance the interpretability of predictive models and support more trustworthy and transparent decision-making.

LGJul 16, 2024

ITI-IQA: a Toolbox for Heterogeneous Univariate and Multivariate Missing Data Imputation Quality Assessment

Pedro Pons-Suñer, Laura Arnal, J. Ramón Navarro-Cerdán et al.

Missing values are a major challenge in most data science projects working on real data. To avoid losing valuable information, imputation methods are used to fill in missing values with estimates, allowing the preservation of samples or variables that would otherwise be discarded. However, if the process is not well controlled, imputation can generate spurious values that introduce uncertainty and bias into the learning process. The abundance of univariate and multivariate imputation techniques, along with the complex trade-off between data reliability and preservation, makes it difficult to determine the best course of action to tackle missing values. In this work, we present ITI-IQA (Imputation Quality Assessment), a set of utilities designed to assess the reliability of various imputation methods, select the best imputer for any feature or group of features, and filter out features that do not meet quality criteria. Statistical tests are conducted to evaluate the suitability of every tested imputer, ensuring that no new biases are introduced during the imputation phase. The result is a trainable pipeline of filters and imputation methods that streamlines the process of dealing with missing data, supporting different data types: continuous, discrete, binary, and categorical. The toolbox also includes a suite of diagnosing methods and graphical tools to check measurements and results during and after handling missing data.

Laura Arnal

2 Papers