Henning Muller

CV
h-index41
8papers
152citations
Novelty43%
AI Score35

8 Papers

IVNov 9, 2022Code
Novel structural-scale uncertainty measures and error retention curves: application to multiple sclerosis

Nataliia Molchanova, Vatsal Raina, Andrey Malinin et al.

This paper focuses on the uncertainty estimation for white matter lesions (WML) segmentation in magnetic resonance imaging (MRI). On one side, voxel-scale segmentation errors cause the erroneous delineation of the lesions; on the other side, lesion-scale detection errors lead to wrong lesion counts. Both of these factors are clinically relevant for the assessment of multiple sclerosis patients. This work aims to compare the ability of different voxel- and lesion-scale uncertainty measures to capture errors related to segmentation and lesion detection, respectively. Our main contributions are (i) proposing new measures of lesion-scale uncertainty that do not utilise voxel-scale uncertainties; (ii) extending an error retention curves analysis framework for evaluation of lesion-scale uncertainty measures. Our results obtained on the multi-center testing set of 58 patients demonstrate that the proposed lesion-scale measure achieves the best performance among the analysed measures. All code implementations are provided at https://github.com/NataliiaMolch/MS_WML_uncs

CVNov 15, 2023Code
Structural-Based Uncertainty in Deep Learning Across Anatomical Scales: Analysis in White Matter Lesion Segmentation

Nataliia Molchanova, Vatsal Raina, Andrey Malinin et al.

This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. First, we postulate that a reliable uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 444 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.

IVFeb 10, 2023Code
Tackling Bias in the Dice Similarity Coefficient: Introducing nDSC for White Matter Lesion Segmentation

Vatsal Raina, Nataliia Molchanova, Mara Graziani et al.

The development of automatic segmentation techniques for medical imaging tasks requires assessment metrics to fairly judge and rank such approaches on benchmarks. The Dice Similarity Coefficient (DSC) is a popular choice for comparing the agreement between the predicted segmentation against a ground-truth mask. However, the DSC metric has been shown to be biased to the occurrence rate of the positive class in the ground-truth, and hence should be considered in combination with other metrics. This work describes a detailed analysis of the recently proposed normalised Dice Similarity Coefficient (nDSC) for binary segmentation tasks as an adaptation of DSC which scales the precision at a fixed recall rate to tackle this bias. White matter lesion segmentation on magnetic resonance images of multiple sclerosis patients is selected as a case study task to empirically assess the suitability of nDSC. We validate the normalised DSC using two different models across 59 subject scans with a wide range of lesion loads. It is found that the nDSC is less biased than DSC with lesion load on standard white matter lesion segmentation benchmarks measured using standard rank correlation coefficients. An implementation of nDSC is made available at: https://github.com/NataliiaMolch/nDSC .

CVApr 19, 2023
Disentangling Neuron Representations with Concept Vectors

Laura O'Mahony, Vincent Andrearczyk, Henning Muller et al.

Mechanistic interpretability aims to understand how models store representations by breaking down neural networks into interpretable units. However, the occurrence of polysemantic neurons, or neurons that respond to multiple unrelated features, makes interpreting individual neurons challenging. This has led to the search for meaningful vectors, known as concept vectors, in activation space instead of individual neurons. The main contribution of this paper is a method to disentangle polysemantic neurons into concept vectors encapsulating distinct features. Our method can search for fine-grained concepts according to the user's desired level of concept separation. The analysis shows that polysemantic neurons can be disentangled into directions consisting of linear combinations of neurons. Our evaluations show that the concept vectors found encode coherent, human-understandable features.

CVNov 14, 2022
Unsupervised Method for Intra-patient Registration of Brain Magnetic Resonance Images based on Objective Function Weighting by Inverse Consistency: Contribution to the BraTS-Reg Challenge

Marek Wodzinski, Artur Jurgas, Niccolo Marini et al.

Registration of brain scans with pathologies is difficult, yet important research area. The importance of this task motivated researchers to organize the BraTS-Reg challenge, jointly with IEEE ISBI 2022 and MICCAI 2022 conferences. The organizers introduced the task of aligning pre-operative to follow-up magnetic resonance images of glioma. The main difficulties are connected with the missing data leading to large, nonrigid, and noninvertible deformations. In this work, we describe our contributions to both the editions of the BraTS-Reg challenge. The proposed method is based on combined deep learning and instance optimization approaches. First, the instance optimization enriches the state-of-the-art LapIRN method to improve the generalizability and fine-details preservation. Second, an additional objective function weighting is introduced, based on the inverse consistency. The proposed method is fully unsupervised and exhibits high registration quality and robustness. The quantitative results on the external validation set are: (i) IEEE ISBI 2022 edition: 1.85, and 0.86, (ii) MICCAI 2022 edition: 1.71, and 0.86, in terms of the mean of median absolute error and robustness respectively. The method scored the 1st place during the IEEE ISBI 2022 version of the challenge and the 3rd place during the MICCAI 2022. Future work could transfer the inverse consistency-based weighting directly into the deep network training.

IVFeb 7, 2025Code
Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

Muhammad Imran, Jonathan R. Krebs, Vishal Balaji Sivaraman et al.

Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently available to support the development of multi-class aortic segmentation methods. To address this gap, we organized the AortaSeg24 MICCAI Challenge, introducing the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This dataset was designed to facilitate both model development and validation. The challenge attracted 121 teams worldwide, with participants leveraging state-of-the-art frameworks such as nnU-Net and exploring novel techniques, including cascaded models, data augmentation strategies, and custom loss functions. We evaluated the submitted algorithms using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD), highlighting the approaches adopted by the top five performing teams. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms. The annotated dataset, evaluation code, and implementations of the leading methods are publicly available to support further research. All resources can be accessed at https://aortaseg24.grand-challenge.org.

CVAug 4, 2020Code
Learning Interpretable Microscopic Features of Tumor by Multi-task Adversarial CNNs To Improve Generalization

Mara Graziani, Sebastian Otalora, Stephane Marchand-Maillet et al.

Adopting Convolutional Neural Networks (CNNs) in the daily routine of primary diagnosis requires not only near-perfect precision, but also a sufficient degree of generalization to data acquisition shifts and transparency. Existing CNN models act as black boxes, not ensuring to the physicians that important diagnostic features are used by the model. Building on top of successfully existing techniques such as multi-task learning, domain adversarial training and concept-based interpretability, this paper addresses the challenge of introducing diagnostic factors in the training objectives. Here we show that our architecture, by learning end-to-end an uncertainty-based weighting combination of multi-task and adversarial losses, is encouraged to focus on pathology features such as density and pleomorphism of nuclei, e.g. variations in size and appearance, while discarding misleading features such as staining differences. Our results on breast lymph node tissue show significantly improved generalization in the detection of tumorous tissue, with best average AUC 0.89 (0.01) against the baseline AUC 0.86 (0.005). By applying the interpretability technique of linearly probing intermediate representations, we also demonstrate that interpretable pathology features such as nuclei density are learned by the proposed CNN architecture, confirming the increased transparency of this model. This result is a starting point towards building interpretable multi-task architectures that are robust to data heterogeneity. Our code is available at https://github.com/maragraziani/multitask_adversarial

IVDec 13, 2021
The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients

Bhakti Baheti, Satrajit Chakrabarty, Hamed Akbari et al.

Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in developing general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registration (BraTS-Reg) challenge, as the first public benchmark environment for deformable registration algorithms focusing on estimating correspondences between pre-operative and follow-up scans of the same patient diagnosed with a diffuse brain glioma. The BraTS-Reg data comprise de-identified multi-institutional multi-parametric MRI (mpMRI) scans, curated for size and resolution according to a canonical anatomical template, and divided into training, validation, and testing sets. Clinical experts annotated ground truth (GT) landmark points of anatomical locations distinct across the temporal domain. Quantitative evaluation and ranking were based on the Median Euclidean Error (MEE), Robustness, and the determinant of the Jacobian of the displacement field. The top-ranked methodologies yielded similar performance across all evaluation metrics and shared several methodological commonalities, including pre-alignment, deep neural networks, inverse consistency analysis, and test-time instance optimization per-case basis as a post-processing step. The top-ranked method attained the MEE at or below that of the inter-rater variability for approximately 60% of the evaluated landmarks, underscoring the scope for further accuracy and robustness improvements, especially relative to human experts. The aim of BraTS-Reg is to continue to serve as an active resource for research, with the data and online evaluation tools accessible at https://bratsreg.github.io/.