Przemyslaw Biecek

LG
h-index69
50papers
3,674citations
Novelty44%
AI Score58

50 Papers

CRJun 6, 2023
Adversarial attacks and defenses in explainable artificial intelligence: A survey

Hubert Baniecki, Przemyslaw Biecek

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model's reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We introduce a unified notation and taxonomy of methods facilitating a common ground for researchers and practitioners from the intersecting research fields of AdvML and XAI. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI). Future work should address improving explanation methods and evaluation protocols to take into account the reported safety issues.

IVJul 10, 2024Code
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation

Szymon Płotka, Maciej Chrabaszcz, Przemyslaw Biecek

Recent advances in Vision Transformers (ViTs) have significantly enhanced medical image segmentation by facilitating the learning of global relationships. However, these methods face a notable challenge in capturing diverse local and global long-range sequential feature representations, particularly evident in whole-body CT (WBCT) scans. To overcome this limitation, we introduce Swin Soft Mixture Transformer (Swin SMT), a novel architecture based on Swin UNETR. This model incorporates a Soft Mixture-of-Experts (Soft MoE) to effectively handle complex and diverse long-range dependencies. The use of Soft MoE allows for scaling up model parameters maintaining a balance between computational complexity and segmentation performance in both training and inference modes. We evaluate Swin SMT on the publicly available TotalSegmentator-V2 dataset, which includes 117 major anatomical structures in WBCT images. Comprehensive experimental results demonstrate that Swin SMT outperforms several state-of-the-art methods in 3D anatomical structure segmentation, achieving an average Dice Similarity Coefficient of 85.09%. The code and pre-trained weights of Swin SMT are publicly available at https://github.com/MI2DataLab/SwinSMT.

66.0AIMay 31
The Case for Model Science: Verify, Explore, Steer, Refine

Przemyslaw Biecek, Luca Longo, Jianlong Zhou et al.

We argue that the AI community is now ready to move beyond benchmarking and consolidate scattered efforts in model analysis into a systematic discipline, a direction we term Model Science. Complex AI models now serve billions of users, yet our understanding of how they work lags far behind our ability to deploy them. Decades of benchmark-driven research have delivered remarkable progress: extensive leaderboards, a wide range of performance metrics, tracking capability gains across diverse tasks; yet this success has also revealed the limits of benchmarks as they tell us whether models perform but not why they succeed or fail, they miss critical failure modes, such as hallucinations or shortcuts. Precedents from established sciences point the way forward: cognitive science shows that understanding complex systems requires complementary levels of analysis; neuroscience demonstrates that deep study of single cases reveals what population studies miss; medicine teaches that specialised training must develop alongside research practice; and agriculture models how shared infrastructure and principles enable cumulative progress. These lessons inform three foundations for Model Science. First, we propose to consolidate research around four functional perspectives: Verify, Explore, Steer, and Refine that address complementary questions about model behaviour. Second, we discuss the required infrastructure for cumulative knowledge: catalogues of datasets, models and findings. Third, we highlight the need for deep analysis of individual model instances, not just model families, because single cases can reveal what population studies miss.

CVJul 23, 2024Code
Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models

Maciej Chrabaszcz, Hubert Baniecki, Piotr Komorowski et al.

Analysis of 3D segmentation models, especially in the context of medical imaging, is often limited to segmentation performance metrics that overlook the crucial aspect of explainability and bias. Currently, effectively explaining these models with saliency maps is challenging due to the high dimensions of input images multiplied by the ever-growing number of segmented class labels. To this end, we introduce Agg^2Exp, a methodology for aggregating fine-grained voxel attributions of the segmentation model's predictions. Unlike classical explanation methods that primarily focus on the local feature attribution, Agg^2Exp enables a more comprehensive global view on the importance of predicted segments in 3D images. Our benchmarking experiments show that gradient-based voxel attributions are more faithful to the model's predictions than perturbation-based explanations. As a concrete use-case, we apply Agg^2Exp to discover knowledge acquired by the Swin UNEt TRansformer model trained on the TotalSegmentator v2 dataset for segmenting anatomical structures in computed tomography medical images. Agg^2Exp facilitates the explanatory analysis of large segmentation models beyond their predictive performance. The source code is publicly available at https://github.com/mi2datalab/agg2exp.

CVMar 17, 2023
Interpretable machine learning for time-to-event prediction in medicine and healthcare

Hubert Baniecki, Bartlomiej Sobieski, Patryk Szatkowski et al.

Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature importance explanations. We show how post-hoc interpretation methods allow for finding biases in AI systems predicting length of stay using a novel multi-modal dataset created from 1235 X-ray images with textual radiology reports annotated by human experts. Moreover, we evaluate cancer survival models beyond predictive performance to include the importance of multi-omics feature groups based on a large-scale benchmark comprising 11 datasets from The Cancer Genome Atlas (TCGA). Model developers can use the proposed methods to debug and improve machine learning algorithms, while physicians can discover disease biomarkers and assess their significance. We hope the contributed open data and code resources facilitate future work in the emerging research direction of explainable survival analysis.

MLFeb 26, 2023
Performance is not enough: the story told by a Rashomon quartet

Przemyslaw Biecek, Hubert Baniecki, Mateusz Krzyzinski et al.

The usual goal of supervised learning is to find the best model, the one that optimizes a particular performance measure. However, what if the explanation provided by this model is completely different from another model and different again from another model despite all having similarly good fit statistics? Is it possible that the equally effective models put the spotlight on different relationships in the data? Inspired by Anscombe's quartet, this paper introduces a Rashomon Quartet, i.e. a set of four models built on a synthetic dataset which have practically identical predictive performance. However, the visual exploration reveals distinct explanations of the relations in the data. This illustrative example aims to encourage the use of methods for model visualization to compare predictive models beyond their performance.

LGAug 29, 2023
Glocal Explanations of Expected Goal Models in Soccer

Mustafa Cavus, Adrian Stando, Przemyslaw Biecek

The expected goal models have gained popularity, but their interpretability is often limited, especially when trained using black-box methods. Explainable artificial intelligence tools have emerged to enhance model transparency and extract descriptive knowledge for a single observation or for all observations. However, explaining black-box models for a specific group of observations may be more useful in some domains. This paper introduces the glocal explanations (between local and global levels) of the expected goal models to enable performance analysis at the team and player levels by proposing the use of aggregated versions of the SHAP values and partial dependence profiles. This allows knowledge to be extracted from the expected goal model for a player or team rather than just a single shot. In addition, we conducted real-data applications to illustrate the usefulness of aggregated SHAP and aggregated profiles. The paper concludes with remarks on the potential of these explanations for performance analysis in soccer analytics.

AIJul 15, 2023
Explaining and visualizing black-box models through counterfactual paths

Bastian Pfeifer, Mateusz Krzyzinski, Hubert Baniecki et al.

Explainable AI (XAI) is an increasingly important area of machine learning research, which aims to make black-box models transparent and interpretable. In this paper, we propose a novel approach to XAI that uses the so-called counterfactual paths generated by conditional permutations of features. The algorithm measures feature importance by identifying sequential permutations of features that most influence changes in model predictions. It is particularly suitable for generating explanations based on counterfactual paths in knowledge graphs incorporating domain knowledge. Counterfactual paths introduce an additional graph dimension to current XAI methods in both explaining and visualizing black-box models. Experiments with synthetic and medical data demonstrate the practical applicability of our approach.

CVNov 8, 2023
Be Careful When Evaluating Explanations Regarding Ground Truth

Hubert Baniecki, Maciej Chrabaszcz, Andreas Holzinger et al.

Evaluating explanations of image classifiers regarding ground truth, e.g. segmentation masks defined by human perception, primarily evaluates the quality of the models under consideration rather than the explanation methods themselves. Driven by this observation, we propose a framework for $\textit{jointly}$ evaluating the robustness of safety-critical systems that $\textit{combine}$ a deep neural network with an explanation method. These are increasingly used in real-world applications like medical image analysis or robotics. We introduce a fine-tuning procedure to (mis)align model$\unicode{x2013}$explanation pipelines with ground truth and use it to quantify the potential discrepancy between worst and best-case scenarios of human alignment. Experiments across various model architectures and post-hoc local interpretation methods provide insights into the robustness of vision transformers and the overall vulnerability of such AI systems to potential adversarial attacks.

MLMay 11, 2022
Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections

Nicholas Spyrison, Dianne Cook, Przemyslaw Biecek

The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.

LGMar 12, 2023
Challenges facing the explainability of age prediction models: case study for two modalities

Mikolaj Spytek, Weronika Hryniewska-Guzik, Jaroslaw Zygierewicz et al.

The prediction of age is a challenging task with various practical applications in high-impact fields like the healthcare domain or criminology. Despite the growing number of models and their increasing performance, we still know little about how these models work. Numerous examples of failures of AI systems show that performance alone is insufficient, thus, new methods are needed to explore and explain the reasons for the model's predictions. In this paper, we investigate the use of Explainable Artificial Intelligence (XAI) for age prediction focusing on two specific modalities, EEG signal and lung X-rays. We share predictive models for age to facilitate further research on new techniques to explain models for these modalities.

LGAug 21, 2022
Performance, Opaqueness, Consequences, and Assumptions: Simple questions for responsible planning of machine learning solutions

Przemyslaw Biecek

The data revolution has generated a huge demand for data-driven solutions. This demand propels a growing number of easy-to-use tools and training for aspiring data scientists that enable the rapid building of predictive models. Today, weapons of math destruction can be easily built and deployed without detailed planning and validation. This rapidly extends the list of AI failures, i.e. deployments that lead to financial losses or even violate democratic values such as equality, freedom and justice. The lack of planning, rules and standards around the model development leads to the ,,anarchisation of AI". This problem is reported under different names such as validation debt, reproducibility crisis, and lack of explainability. Post-mortem analysis of AI failures often reveals mistakes made in the early phase of model development or data acquisition. Thus, instead of curing the consequences of deploying harmful models, we shall prevent them as early as possible by putting more attention to the initial planning stage. In this paper, we propose a quick and simple framework to support planning of AI solutions. The POCA framework is based on four pillars: Performance, Opaqueness, Consequences, and Assumptions. It helps to set the expectations and plan the constraints for the AI solution before any model is built and any data is collected. With the help of the POCA method, preliminary requirements can be defined for the model-building process, so that costly model misspecification errors can be identified as soon as possible or even avoided. AI researchers, product owners and business analysts can use this framework in the initial stages of building AI solutions.

CVFeb 27, 2025Code
Interpreting CLIP with Hierarchical Sparse Autoencoders

Vladimir Zaigrajew, Hubert Baniecki, Przemyslaw Biecek

Sparse autoencoders (SAEs) are useful for detecting and steering interpretable features in neural networks, with particular potential for understanding complex multimodal representations. Given their ability to uncover interpretable features, SAEs are particularly valuable for analyzing large-scale vision-language models (e.g., CLIP and SigLIP), which are fundamental building blocks in modern systems yet remain challenging to interpret and control. However, current SAE methods are limited by optimizing both reconstruction quality and sparsity simultaneously, as they rely on either activation suppression or rigid sparsity constraints. To this end, we introduce Matryoshka SAE (MSAE), a new architecture that learns hierarchical representations at multiple granularities simultaneously, enabling a direct optimization of both metrics without compromise. MSAE establishes a new state-of-the-art Pareto frontier between reconstruction quality and sparsity for CLIP, achieving 0.99 cosine similarity and less than 0.1 fraction of variance unexplained while maintaining ~80% sparsity. Finally, we demonstrate the utility of MSAE as a tool for interpreting and controlling CLIP by extracting over 120 semantic concepts from its representation to perform concept-based similarity search and bias analysis in downstream tasks like CelebA. We make the codebase available at https://github.com/WolodjaZ/MSAE.

LGJan 30
Auditing Sybil: Explaining Deep Lung Cancer Risk Prediction Through Generative Interventional Attributions

Bartlomiej Sobieski, Jakub Grzywaczewski, Karol Dobiczek et al.

Lung cancer remains the leading cause of cancer mortality, driving the development of automated screening tools to alleviate radiologist workload. Standing at the frontier of this effort is Sybil, a deep learning model capable of predicting future risk solely from computed tomography (CT) with high precision. However, despite extensive clinical validation, current assessments rely purely on observational metrics. This correlation-based approach overlooks the model's actual reasoning mechanism, necessitating a shift to causal verification to ensure robust decision-making before clinical deployment. We propose S(H)NAP, a model-agnostic auditing framework that constructs generative interventional attributions validated by expert radiologists. By leveraging realistic 3D diffusion bridge modeling to systematically modify anatomical features, our approach isolates object-specific causal contributions to the risk score. Providing the first interventional audit of Sybil, we demonstrate that while the model often exhibits behavior akin to an expert radiologist, differentiating malignant pulmonary nodules from benign ones, it suffers from critical failure modes, including dangerous sensitivity to clinically unjustified artifacts and a distinct radial bias.

53.5CVMay 10
SwordBench: Evaluating Orthogonality of Steering Image Representations

Vladimir Zaigrajew, Dawid Pludowski, Hubert Baniecki et al.

Steering or intervening on model representations at inference time to correct predictions is essential for AI interpretability and safety, yet existing evaluation protocols are limited to ambiguous language modeling tasks. To address this gap, we introduce SwordBench, a benchmark for steering image representations of vision models across multiple backbones and concept removal tasks. Beyond a unified benchmarking suite, we propose new evaluation notions that uncover the second-order effects of orthogonalization among concept activation vectors for pragmatic steering. Specifically, cross-concept robustness measures the stability of concept detection performance across inputs orthogonalized against alternative concepts, and collateral damage quantifies whether steering inadvertently affects model performance on a downstream task for inputs lacking the bias. We find that although a linear support vector machine exhibits superior separability and orthogonality, it fails to achieve zero collateral damage, often trailing sparse autoencoders. In simpler regimes, both standard baselines and optimization-based methods fail to achieve perfect steering. The source code will be made available soon on GitHub.

LGMay 1, 2020Code
The Grammar of Interactive Explanatory Model Analysis

Hubert Baniecki, Dariusz Parzych, Przemyslaw Biecek

The growing need for in-depth analysis of predictive models leads to a series of new methods for explaining their local and global properties. Which of these methods is the best? It turns out that this is an ill-posed question. One cannot sufficiently explain a black-box machine learning model using a single method that gives only one perspective. Isolated explanations are prone to misunderstanding, leading to wrong or simplistic reasoning. This problem is known as the Rashomon effect and refers to diverse, even contradictory, interpretations of the same phenomenon. Surprisingly, most methods developed for explainable and responsible machine learning focus on a single-aspect of the model behavior. In contrast, we showcase the problem of explainability as an interactive and sequential analysis of a model. This paper proposes how different Explanatory Model Analysis (EMA) methods complement each other and discusses why it is essential to juxtapose them. The introduced process of Interactive EMA (IEMA) derives from the algorithmic side of explainable machine learning and aims to embrace ideas developed in cognitive sciences. We formalize the grammar of IEMA to describe potential human-model dialogues. It is implemented in a widely used human-centered open-source software framework that adopts interactivity, customizability and automation as its main traits. We conduct a user study to evaluate the usefulness of IEMA, which indicates that an interactive sequential analysis of a model increases the performance and confidence of human decision making.

87.1LGMay 7
Attributions All the Way Down? The Metagame of Interpretability

Hubert Baniecki, Przemyslaw Biecek, Fabian Fumagalli

We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $ϕ(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the attribution of feature $i$, denoted as meta-attribution $φ_{j \to i}(f)$, by treating the attribution method itself as a cooperative game and computing its Shapley value. Theoretically, we prove that attributions hierarchically decompose into meta-attributions, and establish these as directional extensions of existing interaction indices. Empirically, we demonstrate that the metagame delivers insights across diverse interpretability applications: (i) quantifying token interactions in instruction-tuned language models, (ii) explaining cross-modal similarity in vision-language encoders, and (iii) interpreting text-to-image concepts in multimodal diffusion transformers.

41.5CVMay 6
Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models

Bartlomiej Sobieski, Matthew Tivnan, Dawid Płudowski et al.

Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode from several viewpoints, offering partial explanations to their occurrence, such as mode interpolation. In this work, we propose a complementary perspective that treats hallucinations as instabilities on the model-induced manifold. We begin by showing that a hallucination filter based on such instabilities matches or exceeds the performance of the recently proposed temporal one. By tracing the source of these instabilities, we identify local intrinsic dimension (LID) as their primary driver and propose Intrinsic Quenching (IQ), a direct corrective mechanism that deflates it to alleviate hallucinations. IQ consistently outperforms standard hallucination reduction baselines across a wide array of benchmarks and offers a highly promising solution for enforcing anatomical consistency in downstream medical imaging tasks.

AIFeb 21, 2024
Position: Explain to Question not to Justify

Przemyslaw Biecek, Wojciech Samek

Explainable Artificial Intelligence (XAI) is a young but very promising field of research. Unfortunately, the progress in this field is currently slowed down by divergent and incompatible goals. We separate various threads tangled within the area of XAI into two complementary cultures of human/value-oriented explanations (BLUE XAI) and model/validation-oriented explanations (RED XAI). This position paper argues that the area of RED XAI is currently under-explored, i.e., more methods for explainability are desperately needed to question models (e.g., extract knowledge from well-performing models as well as spotting and fixing bugs in faulty models), and the area of RED XAI hides great opportunities and potential for important research necessary to ensure the safety of AI systems. We conclude this paper by presenting promising challenges in this area.

CVOct 16, 2024
Rethinking Visual Counterfactual Explanations Through Region Constraint

Bartlomiej Sobieski, Jakub Grzywaczewski, Bartlomiej Sadlej et al.

Visual counterfactual explanations (VCEs) have recently gained immense popularity as a tool for clarifying the decision-making process of image classifiers. This trend is largely motivated by what these explanations promise to deliver -- indicate semantically meaningful factors that change the classifier's decision. However, we argue that current state-of-the-art approaches lack a crucial component -- the region constraint -- whose absence prevents from drawing explicit conclusions, and may even lead to faulty reasoning due to phenomenons like confirmation bias. To address the issue of previous methods, which modify images in a very entangled and widely dispersed manner, we propose region-constrained VCEs (RVCEs), which assume that only a predefined image region can be modified to influence the model's prediction. To effectively sample from this subclass of VCEs, we propose Region-Constrained Counterfactual Schrödinger Bridges (RCSB), an adaptation of a tractable subclass of Schrödinger Bridges to the problem of conditional inpainting, where the conditioning signal originates from the classifier of interest. In addition to setting a new state-of-the-art by a large margin, we extend RCSB to allow for exact counterfactual reasoning, where the predefined region contains only the factor of interest, and incorporating the user to actively interact with the RVCE by predefining the regions manually.

MLDec 12, 2024
Investigating the Impact of Balancing, Filtering, and Complexity on Predictive Multiplicity: A Data-Centric Perspective

Mustafa Cavus, Przemyslaw Biecek

The Rashomon effect presents a significant challenge in model selection. It occurs when multiple models achieve similar performance on a dataset but produce different predictions, resulting in predictive multiplicity. This is especially problematic in high-stakes environments, where arbitrary model outcomes can have serious consequences. Traditional model selection methods prioritize accuracy and fail to address this issue. Factors such as class imbalance and irrelevant variables further complicate the situation, making it harder for models to provide trustworthy predictions. Data-centric AI approaches can mitigate these problems by prioritizing data optimization, particularly through preprocessing techniques. However, recent studies suggest preprocessing methods may inadvertently inflate predictive multiplicity. This paper investigates how data preprocessing techniques like balancing and filtering methods impact predictive multiplicity and model stability, considering the complexity of the data. We conduct the experiments on 21 real-world datasets, applying various balancing and filtering techniques, and assess the level of predictive multiplicity introduced by these methods by leveraging the Rashomon effect. Additionally, we examine how filtering techniques reduce redundancy and enhance model generalization. The findings provide insights into the relationship between balancing methods, data complexity, and predictive multiplicity, demonstrating how data-centric AI strategies can improve model performance.

CVApr 2, 2024
Red-Teaming Segment Anything Model

Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski et al.

Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against challenging tasks: (1) We analyze the impact of style transfer on segmentation masks, demonstrating that applying adverse weather conditions and raindrops to dashboard images of city roads significantly distorts generated masks. (2) We focus on assessing whether the model can be used for attacks on privacy, such as recognizing celebrities' faces, and show that the model possesses some undesired knowledge in this task. (3) Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. We not only show the effectiveness of popular white-box attacks and resistance to black-box attacks but also introduce a novel approach - Focused Iterative Gradient Attack (FIGA) that combines white-box approaches to construct an efficient attack resulting in a smaller number of modified pixels. All of our testing methods and analyses indicate a need for enhanced safety measures in foundation models for image segmentation.

CVMar 12, 2024
Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI

Vladimir Zaigrajew, Hubert Baniecki, Lukasz Tulczyjew et al.

Remote sensing (RS) applications in the space domain demand machine learning (ML) models that are reliable, robust, and quality-assured, making red teaming a vital approach for identifying and exposing potential flaws and biases. Since both fields advance independently, there is a notable gap in integrating red teaming strategies into RS. This paper introduces a methodology for examining ML models operating on hyperspectral images within the HYPERVIEW challenge, focusing on soil parameters' estimation. We use post-hoc explanation methods from the Explainable AI (XAI) domain to critically assess the best performing model that won the HYPERVIEW challenge and served as an inspiration for the model deployed on board the INTUITION-1 hyperspectral mission. Our approach effectively red teams the model by pinpointing and validating key shortcomings, constructing a model that achieves comparable performance using just 1% of the input features and a mere up to 5% performance loss. Additionally, we propose a novel way of visualizing explanations that integrate domain-specific information about hyperspectral bands (wavelengths) and data transformations to better suit interpreting models for hyperspectral image analysis.

LGMar 11, 2025
Birds look like cars: Adversarial analysis of intrinsically interpretable deep learning

Hubert Baniecki, Przemyslaw Biecek

A common belief is that intrinsically interpretable deep learning models ensure a correct, intuitive understanding of their behavior and offer greater robustness against accidental errors or intentional manipulation. However, these beliefs have not been comprehensively verified, and growing evidence casts doubt on them. In this paper, we highlight the risks related to overreliance and susceptibility to adversarial manipulation of these so-called "intrinsically (aka inherently) interpretable" models by design. We introduce two strategies for adversarial analysis with prototype manipulation and backdoor attacks against prototype-based networks, and discuss how concept bottleneck models defend against these attacks. Fooling the model's reasoning by exploiting its use of latent prototypes manifests the inherent uninterpretability of deep neural networks, leading to a false sense of security reinforced by a visual confirmation bias. The reported limitations of part-prototype networks put their trustworthiness and applicability into question, motivating further work on the robustness and alignment of (deep) interpretable models.

AIAug 27, 2025
Model Science: getting serious about verification, explanation and control of AI systems

Przemyslaw Biecek, Wojciech Samek

The growing adoption of foundation models calls for a paradigm shift from Data Science to Model Science. Unlike data-centric approaches, Model Science places the trained model at the core of analysis, aiming to interact, verify, explain, and control its behavior across diverse operational contexts. This paper introduces a conceptual framework for a new discipline called Model Science, along with the proposal for its four key pillars: Verification, which requires strict, context-aware evaluation protocols; Explanation, which is understood as various approaches to explore of internal model operations; Control, which integrates alignment techniques to steer model behavior; and Interface, which develops interactive and visual explanation tools to improve human calibration and decision-making. The proposed framework aims to guide the development of credible, safe, and human-aligned AI systems.

CVAug 7, 2025
Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions

Hubert Baniecki, Maximilian Muschalik, Fabian Fumagalli et al.

Language-image pre-training (LIP) enables the development of vision-language models capable of zero-shot classification, localization, multimodal retrieval, and semantic understanding. Various explanation methods have been proposed to visualize the importance of input image-text pairs on the model's similarity outputs. However, popular saliency maps are limited by capturing only first-order attributions, overlooking the complex cross-modal interactions intrinsic to such encoders. We introduce faithful interaction explanations of LIP models (FIxLIP) as a unified approach to decomposing the similarity in vision-language encoders. FIxLIP is rooted in game theory, where we analyze how using the weighted Banzhaf interaction index offers greater flexibility and improves computational efficiency over the Shapley interaction quantification framework. From a practical perspective, we propose how to naturally extend explanation evaluation metrics, such as the pointing game and area between the insertion/deletion curves, to second-order interaction explanations. Experiments on the MS COCO and ImageNet-1k benchmarks validate that second-order methods, such as FIxLIP, outperform first-order attribution methods. Beyond delivering high-quality explanations, we demonstrate the utility of FIxLIP in comparing different models, e.g. CLIP vs. SigLIP-2.

IVJul 7, 2025
X-ray transferable polyrepresentation learning

Weronika Hryniewska-Guzik, Przemyslaw Biecek

The success of machine learning algorithms is inherently related to the extraction of meaningful features, as they play a pivotal role in the performance of these algorithms. Central to this challenge is the quality of data representation. However, the ability to generalize and extract these features effectively from unseen datasets is also crucial. In light of this, we introduce a novel concept: the polyrepresentation. Polyrepresentation integrates multiple representations of the same modality extracted from distinct sources, for example, vector embeddings from the Siamese Network, self-supervised models, and interpretable radiomic features. This approach yields better performance metrics compared to relying on a single representation. Additionally, in the context of X-ray images, we demonstrate the transferability of the created polyrepresentation to a smaller dataset, underscoring its potential as a pragmatic and resource-efficient approach in various image-related solutions. It is worth noting that the concept of polyprepresentation on the example of medical data can also be applied to other domains, showcasing its versatility and broad potential impact.

LGJun 30, 2025
System-Embedded Diffusion Bridge Models

Bartlomiej Sobieski, Matthew Tivnan, Yuang Wang et al.

Solving inverse problems -- recovering signals from incomplete or noisy measurements -- is fundamental in science and engineering. Score-based generative models (SGMs) have recently emerged as a powerful framework for this task. Two main paradigms have formed: unsupervised approaches that adapt pretrained generative models to inverse problems, and supervised bridge methods that train stochastic processes conditioned on paired clean and corrupted data. While the former typically assume knowledge of the measurement model, the latter have largely overlooked this structural information. We introduce System embedded Diffusion Bridge Models (SDBs), a new class of supervised bridge methods that explicitly embed the known linear measurement system into the coefficients of a matrix-valued SDE. This principled integration yields consistent improvements across diverse linear inverse problems and demonstrates robust generalization under system misspecification between training and deployment, offering a promising solution to real-world applications.

LGJun 26, 2024
Efficient and Accurate Explanation Estimation with Distribution Compression

Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl et al.

We discover a theoretical connection between explanation estimation and distribution compression that significantly improves the approximation of feature attributions, importance, and effects. While the exact computation of various machine learning explanations requires numerous model inferences and becomes impractical, the computational cost of approximation increases with an ever-increasing size of data and model parameters. We show that the standard i.i.d. sampling used in a broad spectrum of algorithms for post-hoc explanation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm of sample-efficient explainability. It relies on distribution compression through kernel thinning to obtain a data sample that best approximates its marginal distribution. CTE significantly improves the accuracy and stability of explanation estimation with negligible computational overhead. It often achieves an on-par explanation approximation error 2-3x faster by using fewer samples, i.e. requiring 2-3x fewer model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

LGJun 13, 2024
On the Robustness of Global Feature Effect Explanations

Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl et al.

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

AIAug 26, 2021
Graph-guided random forest for gene set selection

Bastian Pfeifer, Hubert Baniecki, Anna Saranti et al.

Machine learning methods can detect complex relationships between variables, but usually do not exploit domain knowledge. This is a limitation because in many scientific disciplines, such as systems biology, domain knowledge is available in the form of graphs or networks, and its use can improve model performance. We need network-based algorithms that are versatile and applicable in many research areas. In this work, we demonstrate subnetwork detection based on multi-modal node features using a novel Greedy Decision Forest with inherent interpretability. The latter will be a crucial factor to retain experts and gain their trust in such algorithms. To demonstrate a concrete application example, we focus on bioinformatics, systems biology and particularly biomedicine, but the presented methodology is applicable in many other domains as well. Systems biology is a good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our proposed approach can help to uncover disease-causing network modules from multi-omics data to better understand complex diseases such as cancer.

LGMay 26, 2021
Fooling Partial Dependence via Data Poisoning

Hubert Baniecki, Wojciech Kretowicz, Przemyslaw Biecek

Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. We believe this to be the first work using a genetic algorithm for manipulating explanations, which is transferable as it generalizes both ways: in a model-agnostic and an explanation-agnostic manner.

LGApr 7, 2021
Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure

Katarzyna Pekala, Katarzyna Woznica, Przemyslaw Biecek

One of the key elements of explanatory analysis of a predictive model is to assess the importance of individual variables. Rapid development of the area of predictive model exploration (also called explainable artificial intelligence or interpretable machine learning) has led to the popularization of methods for local (instance level) and global (dataset level) methods, such as Permutational Variable Importance, Shapley Values (SHAP), Local Interpretable Model Explanations (LIME), Break Down and so on. However, these methods do not use information about the correlation between features which significantly reduce the explainability of the model behaviour. In this work, we propose new methods to support model analysis by exploiting the information about the correlation between variables. The dataset level aspect importance measure is inspired by the block permutations procedure, while the instance level aspect importance measure is inspired by the LIME method. We show how to analyze groups of variables (aspects) both when they are proposed by the user and when they should be determined automatically based on the hierarchical structure of correlations between variables. Additionally, we present the new type of model visualisation, triplot, which exploits a hierarchical structure of variable grouping to produce a high information density model visualisation. This visualisation provides a consistent illustration for either local or global model and data exploration. We also show an example of real-world data with 5k instances and 37 features in which a significant correlation between variables affects the interpretation of the effect of variable importance. The proposed method is, to our knowledge, the first to allow direct use of the correlation between variables in exploratory model analysis.

LGDec 28, 2020
dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python

Hubert Baniecki, Wojciech Kretowicz, Piotr Piatyszek et al.

The increasing amount of available data, computing power, and the constant pursuit for higher performance results in the growing complexity of predictive models. Their black-box nature leads to opaqueness debt phenomenon inflicting increased risks of discrimination, lack of reproducibility, and deflated performance due to data drift. To manage these risks, good MLOps practices ask for better validation of model performance and fairness, higher explainability, and continuous monitoring. The necessity of deeper model transparency appears not only from scientific and social domains, but also emerging laws and regulations on artificial intelligence. To facilitate the development of responsible machine learning models, we showcase dalex, a Python package which implements the model-agnostic interface for interactive model exploration. It adopts the design crafted through the development of various tools for responsible machine learning; thus, it aims at the unification of the existing solutions. This library's source code and documentation are available under open license at https://python.drwhy.ai/.

MLSep 28, 2020
Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring

Michael Bücker, Gero Szepannek, Alicja Gosiewska et al.

A major requirement for credit scoring models is to provide a maximally accurate risk prediction. Additionally, regulators demand these models to be transparent and auditable. Thus, in credit scoring, very simple predictive models such as logistic regression or decision trees are still widely used and the superior predictive power of modern machine learning algorithms cannot be fully leveraged. Significant potential is therefore missed, leading to higher reserves or more credit defaults. This paper works out different dimensions that have to be considered for making credit scoring models understandable and presents a framework for making ``black box'' machine learning models transparent, auditable and explainable. Following this framework, we present an overview of techniques, demonstrate how they can be applied in credit scoring and how results compare to the interpretability of score cards. A real world case study shows that a comparable degree of interpretability can be achieved while machine learning techniques keep their ability to improve predictive power.

LGSep 24, 2020
Landscape of R packages for eXplainable Artificial Intelligence

Szymon Maksymiuk, Alicja Gosiewska, Przemyslaw Biecek

The growing availability of data and computing power fuels the development of predictive models. In order to ensure the safe and effective functioning of such models, we need methods for exploration, debugging, and validation. New methods and tools for this purpose are being developed within the eXplainable Artificial Intelligence (XAI) subdomain of machine learning. In this work (1) we present the taxonomy of methods for model explanations, (2) we identify and compare 27 packages available in R to perform XAI analysis, (3) we present an example of an application of particular packages, (4) we acknowledge recent trends in XAI. The article is primarily devoted to the tools available in R, but since it is easy to integrate the Python code, we will also show examples for the most popular libraries from Python.

LGFeb 11, 2020
Lifting Interpretability-Performance Trade-off via Automated Feature Engineering

Alicja Gosiewska, Przemyslaw Biecek

Complex black-box predictive models may have high performance, but lack of interpretability causes problems like lack of trust, lack of stability, sensitivity to concept drift. On the other hand, achieving satisfactory accuracy of interpretable models require more time-consuming work related to feature engineering. Can we train interpretable and accurate models, without timeless feature engineering? We propose a method that uses elastic black-boxes as surrogate models to create a simpler, less opaque, yet still accurate and interpretable glass-box models. New models are created on newly engineered features extracted with the help of a surrogate model. We supply the analysis by a large-scale benchmark on several tabular data sets from the OpenML database. There are two results 1) extracting information from complex models may improve the performance of linear models, 2) questioning a common myth that complex machine learning models outperform linear models.

CLOct 6, 2019
Named Entity Recognition -- Is there a glass ceiling?

Tomasz Stanislawek, Anna Wróblewska, Alicja Wójcicka et al.

Recent developments in Named Entity Recognition (NER) have resulted in better and better models. However, is there a glass ceiling? Do we know which types of errors are still hard or even impossible to correct? In this paper, we present a detailed analysis of the types of errors in state-of-the-art machine learning (ML) methods. Our study reveals the weak and strong points of the Stanford, CMU, FLAIR, ELMO and BERT models, as well as their shared limitations. We also introduce new techniques for improving annotation, for training processes and for checking a model's quality and stability. Presented results are based on the CoNLL 2003 data set for the English language. A new enriched semantic annotation of errors for this data set and new diagnostic data sets are attached in the supplementary materials.

LGAug 24, 2019
EPP: interpretable score of model predictive power

Alicja Gosiewska, Mateusz Bakala, Katarzyna Woznica et al.

The most important part of model selection and hyperparameter tuning is the evaluation of model performance. The most popular measures, such as AUC, F1, ACC for binary classification, or RMSE, MAD for regression, or cross-entropy for multilabel classification share two common weaknesses. First is, that they are not on an interval scale. It means that the difference in performance for the two models has no direct interpretation. It makes no sense to compare such differences between datasets. Second is, that for k-fold cross-validation, the model performance is in most cases calculated as an average performance from particular folds, which neglects the information how stable is the performance for different folds. In this talk, we introduce a new EPP rating system for predictive models. We also demonstrate numerous advantages for this system, First, differences in EPP scores have probabilistic interpretation. Based on it we can assess the probability that one model will achieve better performance than another. Second, EPP scores can be directly compared between datasets. Third, they can be used for navigated hyperparameter tuning and model selection. Forth, we can create embeddings for datasets based on EPP scores.

LGJul 9, 2019
Model Development Process

Przemyslaw Biecek

Predictive modeling has an increasing number of applications in various fields. High demand for predictive models drives creation of tools that automate and support work of data scientist on the model development. To better understand what can be automated we need first a description of the model life-cycle. In this paper we propose a generic Model Development Process (MDP). This process is inspired by Rational Unified Process (RUP) which was designed for software development. There are other approached to process description, like CRISP DM or ASUM DM, in this paper we discuss similarities and differences between these methodologies. We believe that the proposed open standard for model development will facilitate creation of tools for automation of model training, testing and maintaining.

LGMar 27, 2019
Do Not Trust Additive Explanations

Alicja Gosiewska, Przemyslaw Biecek

Explainable Artificial Intelligence (XAI)has received a great deal of attention recently. Explainability is being presented as a remedy for the distrust of complex and opaque models. Model agnostic methods such as LIME, SHAP, or Break Down promise instance-level interpretability for any complex machine learning model. But how faithful are these additive explanations? Can we rely on additive explanations for non-additive models? In this paper, we (1) examine the behavior of the most popular instance-level explanations under the presence of interactions, (2) introduce a new method that detects interactions for instance-level explanations, (3) perform a large scale benchmark to see how frequently additive explanations may be misleading.

COMar 27, 2019
The Landscape of R Packages for Automated Exploratory Data Analysis

Mateusz Staniak, Przemyslaw Biecek

The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of twelve popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development.

LGFeb 28, 2019
SAFE ML: Surrogate Assisted Feature Extraction for Model Learning

Alicja Gosiewska, Aleksandra Gacek, Piotr Lubon et al.

Complex black-box predictive models may have high accuracy, but opacity causes problems like lack of trust, lack of stability, sensitivity to concept drift. On the other hand, interpretable models require more work related to feature engineering, which is very time consuming. Can we train interpretable and accurate models, without timeless feature engineering? In this article, we show a method that uses elastic black-boxes as surrogate models to create a simpler, less opaque, yet still accurate and interpretable glass-box models. New models are created on newly engineered features extracted/learned with the help of a surrogate model. We show applications of this method for model level explanations and possible extensions for instance level explanations. We also present an example implementation in Python and benchmark this method on a number of tabular data sets.

CLDec 5, 2018
Are you tough enough? Framework for Robustness Validation of Machine Comprehension Systems

Barbara Rychalska, Dominika Basaj, Przemyslaw Biecek

Deep Learning NLP domain lacks procedures for the analysis of model robustness. In this paper we propose a framework which validates robustness of any Question Answering model through model explainers. We propose that a robust model should transgress the initial notion of semantic similarity induced by word embeddings to learn a more human-like understanding of meaning. We test this property by manipulating questions in two ways: swapping important question word for 1) its semantically correct synonym and 2) for word vector that is close in embedding space. We estimate importance of words in asked questions with Locally Interpretable Model Agnostic Explanations method (LIME). With these two steps we compare state-of-the-art Q&A models. We show that although accuracy of state-of-the-art models is high, they are very fragile to changes in the input. Moreover, we propose 2 adversarial training scenarios which raise model sensitivity to true synonyms by up to 7% accuracy measure. Our findings help to understand which models are more stable and how they can be improved. In addition, we have created and published a new dataset that may be used for validation of robustness of a Q&A model.

COSep 19, 2018
auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics

Alicja Gosiewska, Przemyslaw Biecek

Machine learning models have spread to almost every area of life. They are successfully applied in biology, medicine, finance, physics, and other fields. With modern software it is easy to train even a~complex model that fits the training data and results in high accuracy on the test set. The problem arises when models fail confronted with real-world data. This paper describes methodology and tools for model-agnostic audit. Introduced techniques facilitate assessing and comparing the goodness of fit and performance of models. In~addition, they may be used for the analysis of the similarity of residuals and for identification of~outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. Presented methods were implemented in the auditor package for R. Due to flexible and~consistent grammar, it is simple to validate models of any classes.

CLSep 11, 2018
Does it care what you asked? Understanding Importance of Verbs in Deep Learning QA System

Barbara Rychalska, Dominika Basaj, Przemyslaw Biecek et al.

In this paper we present the results of an investigation of the importance of verbs in a deep learning QA system trained on SQuAD dataset. We show that main verbs in questions carry little influence on the decisions made by the system - in over 90% of researched cases swapping verbs for their antonyms did not change system decision. We track this phenomenon down to the insides of the net, analyzing the mechanism of self-attention and values contained in hidden layers of RNN. Finally, we recognize the characteristics of the SQuAD dataset as the source of the problem. Our work refers to the recently popular topic of adversarial examples in NLP, combined with investigating deep net structure.

CLSep 11, 2018
How much should you ask? On the question structure in QA systems

Dominika Basaj, Barbara Rychalska, Przemyslaw Biecek et al.

Datasets that boosted state-of-the-art solutions for Question Answering (QA) systems prove that it is possible to ask questions in natural language manner. However, users are still used to query-like systems where they type in keywords to search for answer. In this study we validate which parts of questions are essential for obtaining valid answer. In order to conclude that, we take advantage of LIME - a framework that explains prediction by local approximation. We find that grammar and natural language is disregarded by QA. State-of-the-art model can answer properly even if 'asked' only with a few words with high coefficients calculated with LIME. According to our knowledge, it is the first time that QA model is being explained by LIME.

MLJun 23, 2018
DALEX: explainers for complex predictive models

Przemyslaw Biecek

Predictive modeling is invaded by elastic, yet complex methods such as neural networks or ensembles (model stacking, boosting or bagging). Such methods are usually described by a large number of parameters or hyper parameters - a price that one needs to pay for elasticity. The very number of parameters makes models hard to understand. This paper describes a consistent collection of explainers for predictive models, a.k.a. black boxes. Each explainer is a technique for exploration of a black box model. Presented approaches are model-agnostic, what means that they extract useful information from any predictive method despite its internal structure. Each explainer is linked with a specific aspect of a model. Some are useful in decomposing predictions, some serve better in understanding performance, while others are useful in understanding importance and conditional responses of a particular variable. Every explainer presented in this paper works for a single model or for a collection of models. In the latter case, models can be compared against each other. Such comparison helps to find strengths and weaknesses of different approaches and gives additional possibilities for model validation. Presented explainers are implemented in the DALEX package for R. They are based on a uniform standardized grammar of model exploration which may be easily extended. The current implementation supports the most popular frameworks for classification and regression.

MLApr 5, 2018
Explanations of model predictions with live and breakDown packages

Mateusz Staniak, Przemyslaw Biecek

Complex models are commonly used in predictive modeling. In this paper we present R packages that can be used to explain predictions from complex black box models and attribute parts of these predictions to input features. We introduce two new approaches and corresponding packages for such attribution, namely live and breakDown. We also compare their results with existing implementations of state of the art solutions, namely lime that implements Locally Interpretable Model-agnostic Explanations and ShapleyR that implements Shapley values.

MLSep 13, 2017
The Merging Path Plot: adaptive fusing of k-groups with likelihood-based model selection

Agnieszka Sitko, Przemyslaw Biecek

There are many statistical tests that verify the null hypothesis: the variable of interest has the same distribution among k-groups. But once the null hypothesis is rejected, how to present the structure of dissimilarity between groups? In this article, we introduce The Merging Path Plot - a methodology, and factorMerger - an R package, for exploration and visualization of k-group dissimilarities. Comparison of k-groups is one of the most important issues in exploratory analyses and it has zillions of applications. The classical solution is to test a~null hypothesis that observations from all groups come from the same distribution. If the global null hypothesis is rejected, a~more detailed analysis of differences among pairs of groups is performed. The traditional approach is to use pairwise post hoc tests in order to verify which groups differ significantly. However, this approach fails with a large number of groups in both interpretation and visualization layer. The~Merging Path Plot methodology solves this problem by using an easy-to-understand description of dissimilarity among groups based on Likelihood Ratio Test (LRT) statistic.