Maciej Mozolewski

LG
h-index18
5papers
14citations
Novelty40%
AI Score41

5 Papers

IVApr 29
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution

Karol Dobiczek, Maciej Mozolewski, Szymon Bobek et al.

Deep learning models for 12-lead electrocardiogram (ECG) analysis achieve high diagnostic performance but lack the intuitive interpretability required for clinical integration. Standard feature attribution methods are limited by the inherent difficulty in mapping abstract waveform fluctuations to physical anatomical pathologies. To resolve this, we propose a cross-modal method that projects feature attributions from high-performance 12-lead ECG models onto the CineECG 3D anatomical space. Our study reveals that while models trained directly on CineECG signals suffer from reduced accuracy and incoherent attributions, the proposed mapping mechanism effectively recovers clinically relevant feature rankings. Validated against a ground-truth dataset of 20 cases annotated by domain experts, the mapped explanations yield a Dice score of 0.56, significantly outperforming the 0.47 baseline of standard 12-lead attributions. These findings indicate that cross-modal averaging mapping effectively filters attribution instability and improves the localization of pathological features, combining the diagnostic expressiveness of standard ECG with the intuitive clarity of anatomical visualization.

CYOct 21, 2024
Dataset resulting from the user study on comprehensibility of explainable AI algorithms

Szymon Bobek, Paloma Korycińska, Monika Krakowska et al.

This paper introduces a dataset that is the result of a user study on the comprehensibility of explainable artificial intelligence (XAI) algorithms. The study participants were recruited from 149 candidates to form three groups representing experts in the domain of mycology (DE), students with a data science and visualization background (IT) and students from social sciences and humanities (SSH). The main part of the dataset contains 39 transcripts of interviews during which participants were asked to complete a series of tasks and questions related to the interpretation of explanations of decisions of a machine learning model trained to distinguish between edible and inedible mushrooms. The transcripts were complemented with additional data that includes visualizations of explanations presented to the user, results from thematic analysis, recommendations of improvements of explanations provided by the participants, and the initial survey results that allow to determine the domain knowledge of the participant and data analysis literacy. The transcripts were manually tagged to allow for automatic matching between the text and other data related to particular fragments. In the advent of the area of rapid development of XAI techniques, the need for a multidisciplinary qualitative evaluation of explainability is one of the emerging topics in the community. Our dataset allows not only to reproduce the study we conducted, but also to open a wide range of possibilities for the analysis of the material we gathered.

LGAug 3, 2025
Explaining Time Series Classifiers with PHAR: Rule Extraction and Fusion from Post-hoc Attributions

Maciej Mozolewski, Szymon Bobek, Grzegorz J. Nalepa

Explaining machine learning (ML) models for time series (TS) classification remains challenging due to the difficulty of interpreting raw time series and the high dimensionality of the input space. We introduce PHAR-Post-hoc Attribution Rules - a unified framework that transforms numeric feature attributions from post-hoc, instance-wise explainers (e.g., LIME, SHAP) into structured, human-readable rules. These rules define human-readable intervals that indicate where and when decision-relevant segments occur and can enhance model transparency by localizing threshold-based conditions on the raw series. PHAR performs comparably to native rule-based methods, such as Anchor, while scaling more efficiently to long TS sequences and achieving broader instance coverage. A dedicated rule fusion step consolidates rule sets using strategies like weighted selection and lasso-based refinement, balancing key quality metrics: coverage, confidence, and simplicity. This fusion ensures each instance receives a concise and unambiguous rule, improving both explanation fidelity and consistency. We further introduce visualization techniques to illustrate specificity-generalization trade-offs in the derived rules. PHAR resolves conflicting and overlapping explanations - a common effect of the Rashomon phenomenon - into coherent, domain-adaptable insights. Comprehensive experiments on UCR/UEA Time Series Classification Archive demonstrate that PHAR may improve interpretability, decision transparency, and practical applicability for TS classification tasks by providing concise, human-readable rules aligned with model predictions.

AIOct 21, 2024
User-centric evaluation of explainability of AI with and for humans: a comprehensive empirical study

Szymon Bobek, Paloma Korycińska, Monika Krakowska et al.

This study is located in the Human-Centered Artificial Intelligence (HCAI) and focuses on the results of a user-centered assessment of commonly used eXplainable Artificial Intelligence (XAI) algorithms, specifically investigating how humans understand and interact with the explanations provided by these algorithms. To achieve this, we employed a multi-disciplinary approach that included state-of-the-art research methods from social sciences to measure the comprehensibility of explanations generated by a state-of-the-art lachine learning model, specifically the Gradient Boosting Classifier (XGBClassifier). We conducted an extensive empirical user study involving interviews with 39 participants from three different groups, each with varying expertise in data science, data visualization, and domain-specific knowledge related to the dataset used for training the machine learning model. Participants were asked a series of questions to assess their understanding of the model's explanations. To ensure replicability, we built the model using a publicly available dataset from the UC Irvine Machine Learning Repository, focusing on edible and non-edible mushrooms. Our findings reveal limitations in existing XAI methods and confirm the need for new design principles and evaluation techniques that address the specific information needs and user perspectives of different classes of AI stakeholders. We believe that the results of our research and the cross-disciplinary methodology we developed can be successfully adapted to various data types and user profiles, thus promoting dialogue and address opportunities in HCAI research. To support this, we are making the data resulting from our study publicly available.

LGOct 22, 2025
From Prototypes to Sparse ECG Explanations: SHAP-Driven Counterfactuals for Multivariate Time-Series Multi-class Classification

Maciej Mozolewski, Betül Bayrak, Kerstin Bach et al.

In eXplainable Artificial Intelligence (XAI), instance-based explanations for time series have gained increasing attention due to their potential for actionable and interpretable insights in domains such as healthcare. Addressing the challenges of explainability of state-of-the-art models, we propose a prototype-driven framework for generating sparse counterfactual explanations tailored to 12-lead ECG classification models. Our method employs SHAP-based thresholds to identify critical signal segments and convert them into interval rules, uses Dynamic Time Warping (DTW) and medoid clustering to extract representative prototypes, and aligns these prototypes to query R-peaks for coherence with the sample being explained. The framework generates counterfactuals that modify only 78% of the original signal while maintaining 81.3% validity across all classes and achieving 43% improvement in temporal stability. We evaluate three variants of our approach, Original, Sparse, and Aligned Sparse, with class-specific performance ranging from 98.9% validity for myocardial infarction (MI) to challenges with hypertrophy (HYP) detection (13.2%). This approach supports near realtime generation (< 1 second) of clinically valid counterfactuals and provides a foundation for interactive explanation platforms. Our findings establish design principles for physiologically-aware counterfactual explanations in AI-based diagnosis systems and outline pathways toward user-controlled explanation interfaces for clinical deployment.