IVMar 8, 2022
Diffusion Models for Medical Anomaly DetectionJulia Wolleb, Florentin Bieder, Robin Sandkühler et al.
In medical applications, weakly supervised anomaly detection methods are of great interest, as only image-level annotations are required for training. Current anomaly detection methods mainly rely on generative adversarial networks or autoencoder models. Those models are often complicated to train or have difficulties to preserve fine details in the image. We present a novel weakly supervised anomaly detection method based on denoising diffusion implicit models. We combine the deterministic iterative noising and denoising scheme with classifier guidance for image-to-image translation between diseased and healthy subjects. Our method generates very detailed anomaly maps without the need for a complex training procedure. We evaluate our method on the BRATS2020 dataset for brain tumor detection and the CheXpert dataset for detecting pleural effusions.
CVMar 27, 2023
Memory-Efficient 3D Denoising Diffusion Models for Medical Image ProcessingFlorentin Bieder, Julia Wolleb, Alicia Durrer et al.
Denoising diffusion models have recently achieved state-of-the-art performance in many image-generation tasks. They do, however, require a large amount of computational resources. This limits their application to medical tasks, where we often deal with large 3D volumes, like high-resolution three-dimensional data. In this work, we present a number of different ways to reduce the resource consumption for 3D diffusion models and apply them to a dataset of 3D images. The main contribution of this paper is the memory-efficient patch-based diffusion model \textit{PatchDDM}, which can be applied to the total volume during inference while the training is performed only on patches. While the proposed diffusion model can be applied to any image generation tasks, we evaluate the method on the tumor segmentation task of the BraTS2020 dataset and demonstrate that we can generate meaningful three-dimensional segmentations.
CVMar 13Code
NOIR: Neural Operator mapping for Implicit RepresentationsSidaty El Hadramy, Nazim Haouchine, Michael Wehrli et al.
This paper presents NOIR, a framework that reframes core medical imaging tasks as operator learning between continuous function spaces, challenging the prevailing paradigm of discrete grid-based deep learning. Instead of operating on fixed pixel or voxel grids, NOIR embeds discrete medical signals into shared Implicit Neural Representations and learns a Neural Operator that maps between their latent modulations, enabling resolution-independent function-to-function transformations. We evaluate NOIR across multiple 2D and 3D downstream tasks, including segmentation, shape completion, image-to-image translation, and image synthesis, on several public datasets such as Shenzhen, OASIS-4, SkullBreak, fastMRI, as well as an in-house clinical dataset. It achieves competitive performance at native resolution while demonstrating strong robustness to unseen discretizations, and empirically satisfies key theoretical properties of neural operators. The project page is available here: https://github.com/Sidaty1/NOIR-io.
CVApr 6, 2022
The Swiss Army Knife for Image-to-Image Translation: Multi-Task Diffusion ModelsJulia Wolleb, Robin Sandkühler, Florentin Bieder et al.
Recently, diffusion models were applied to a wide range of image analysis tasks. We build on a method for image-to-image translation using denoising diffusion implicit models and include a regression problem and a segmentation problem for guiding the image generation to the desired output. The main advantage of our approach is that the guidance during the denoising process is done by an external gradient. Consequently, the diffusion model does not need to be retrained for the different tasks on the same dataset. We apply our method to simulate the aging process on facial photos using a regression task, as well as on a brain magnetic resonance (MR) imaging dataset for the simulation of brain tumor growth. Furthermore, we use a segmentation model to inpaint tumors at the desired location in healthy slices of brain MR images. We achieve convincing results for all problems.
IVMar 14, 2023
Point Cloud Diffusion Models for Automatic Implant GenerationPaul Friedrich, Julia Wolleb, Florentin Bieder et al.
Advances in 3D printing of biocompatible materials make patient-specific implants increasingly popular. The design of these implants is, however, still a tedious and largely manual process. Existing approaches to automate implant generation are mainly based on 3D U-Net architectures on downsampled or patch-wise data, which can result in a loss of detail or contextual information. Following the recent success of Diffusion Probabilistic Models, we propose a novel approach for implant generation based on a combination of 3D point cloud diffusion models and voxelization networks. Due to the stochastic sampling process in our diffusion model, we can propose an ensemble of different implants per defect, from which the physicians can choose the most suitable one. We evaluate our method on the SkullBreak and SkullFix datasets, generating high-quality implants and achieving competitive evaluation scores.
IVMar 14, 2023
Diffusion Models for Contrast Harmonization of Magnetic Resonance ImagesAlicia Durrer, Julia Wolleb, Florentin Bieder et al.
Magnetic resonance (MR) images from multiple sources often show differences in image contrast related to acquisition settings or the used scanner type. For long-term studies, longitudinal comparability is essential but can be impaired by these contrast differences, leading to biased results when using automated evaluation tools. This study presents a diffusion model-based approach for contrast harmonization. We use a data set consisting of scans of 18 Multiple Sclerosis patients and 22 healthy controls. Each subject was scanned in two MR scanners of different magnetic field strengths (1.5 T and 3 T), resulting in a paired data set that shows scanner-inherent differences. We map images from the source contrast to the target contrast for both directions, from 3 T to 1.5 T and from 1.5 T to 3 T. As we only want to change the contrast, not the anatomical information, our method uses the original image to guide the image-to-image translation process by adding structural information. The aim is that the mapped scans display increased comparability with scans of the target contrast for downstream tasks. We evaluate this method for the task of segmentation of cerebrospinal fluid, grey matter and white matter. Our method achieves good and consistent results for both directions of the mapping.
IVJan 31, 2023
Improved distinct bone segmentation in upper-body CT through multi-resolution networksEva Schnider, Julia Wolleb, Antal Huck et al.
Purpose: Automated distinct bone segmentation from CT scans is widely used in planning and navigation workflows. U-Net variants are known to provide excellent results in supervised semantic segmentation. However, in distinct bone segmentation from upper body CTs a large field of view and a computationally taxing 3D architecture are required. This leads to low-resolution results lacking detail or localisation errors due to missing spatial context when using high-resolution inputs. Methods: We propose to solve this problem by using end-to-end trainable segmentation networks that combine several 3D U-Nets working at different resolutions. Our approach, which extends and generalizes HookNet and MRN, captures spatial information at a lower resolution and skips the encoded information to the target network, which operates on smaller high-resolution inputs. We evaluated our proposed architecture against single resolution networks and performed an ablation study on information concatenation and the number of context networks. Results: Our proposed best network achieves a median DSC of 0.86 taken over all 125 segmented bone classes and reduces the confusion among similar-looking bones in different locations. These results outperform our previously published 3D U-Net baseline results on the task and distinct-bone segmentation results reported by other groups. Conclusion: The presented multi-resolution 3D U-Nets address current shortcomings in bone segmentation from upper-body CT scans by allowing for capturing a larger field of view while avoiding the cubic growth of the input pixels and intermediate computations that quickly outgrow the computational capacities in 3D. The approach thus improves the accuracy and efficiency of distinct bone segmentation from upper-body CT.
LGOct 28, 2022
The secret role of undesired physical effects in accurate shape sensing with eccentric FBGsSamaneh Manavi Roodsari, Sara Freund, Martin Angelmahr et al.
Fiber optic shape sensors have enabled unique advances in various navigation tasks, from medical tool tracking to industrial applications. Eccentric fiber Bragg gratings (FBG) are cheap and easy-to-fabricate shape sensors that are often interrogated with simple setups. However, using low-cost interrogation systems for such intensity-based quasi-distributed sensors introduces further complications to the sensor's signal. Therefore, eccentric FBGs have not been able to accurately estimate complex multi-bend shapes. Here, we present a novel technique to overcome these limitations and provide accurate and precise shape estimation in eccentric FBG sensors. We investigate the most important bending-induced effects in curved optical fibers that are usually eliminated in intensity-based fiber sensors. These effects contain shape deformation information with a higher spatial resolution that we are now able to extract using deep learning techniques. We design a deep learning model based on a convolutional neural network that is trained to predict shapes given the sensor's spectra. We also provide a visual explanation, highlighting wavelength elements whose intensities are more relevant in making shape predictions. These findings imply that deep learning techniques benefit from the bending-induced effects that impact the desired signal in a complex manner. This is the first step toward cheap yet accurate fiber shape sensing solutions.
NAOct 30, 2018
Adaptive Eigenspace SegmentationUri Nahum, Philippe C. Cattin
Image segmentation is an inherently ill-posed problem and thus requires regularization in order to limit the search space to reasonable solutions. A majority of segmentation methods integrates these regularization terms in one way or the other in an energy functional using a balancing term. The tuning of this parameter that either favours more the regularization or the data conformity is critical and, unfortunately, the success of the optimization process strongly depends on it. Often the optimal settings change from image to image. In this paper we propose a novel general framework based on an adaptive eigenspace that was first proposed for solving inverse problems. The resulting method proves accurate and yields robust results, without the need for optimization techniques or being sensitive to the parameter choice. In fact, the method solves a symmetric positive definite sparse system and hence, uses only a fraction of the computational cost. The method is very versatile and does not need parameter-tuning, when segmenting objects from any kind of an image or when segmenting different organs. As the adaptive eigenspace is determined directly from the image to segment, the approach also does not need a tedious training phase.
CVJan 19, 2023
Position Regression for Unsupervised Anomaly DetectionFlorentin Bieder, Julia Wolleb, Robin Sandkühler et al.
In recent years, anomaly detection has become an essential field in medical image analysis. Most current anomaly detection methods for medical images are based on image reconstruction. In this work, we propose a novel anomaly detection approach based on coordinate regression. Our method estimates the position of patches within a volume, and is trained only on data of healthy subjects. During inference, we can detect and localize anomalies by considering the error of the position estimate of a given patch. We apply our method to 3D CT volumes and evaluate it on patients with intracranial haemorrhages and cranial fractures. The results show that our method performs well in detecting these anomalies. Furthermore, we show that our method requires less memory than comparable approaches that involve image reconstruction. This is highly relevant for processing large 3D volumes, for instance, CT or MRI scans.
CVMar 6
Longitudinal NSCLC Treatment Progression via Multimodal Generative ModelsMassimiliano Mantegna, Elena Mulero Ayllón, Alice Natalina Caragliano et al.
Predicting tumor evolution during radiotherapy is a clinically critical challenge, particularly when longitudinal changes are driven by both anatomy and treatment. In this work, we introduce a Virtual Treatment (VT) framework that formulates non-small cell lung cancer (NSCLC) progression as a dose-aware multimodal conditional image-to-image translation problem. Given a CT scan, baseline clinical variables, and a specified radiation dose increment, VT aims to synthesize plausible follow-up CT images reflecting treatment-induced anatomical changes. We evaluate the proposed framework on a longitudinal dataset of 222 stage III NSCLC patients, comprising 895 CT scans acquired during radiotherapy under irregular clinical schedules. The generative process is conditioned on delivered dose increments together with demographic and tumor-related clinical variables. Representative GAN-based and diffusion-based models are benchmarked across 2D and 2.5D configurations. Quantitative and qualitative results indicate that diffusion-based models benefit more consistently from multimodal, dose-aware conditioning and produce more stable and anatomically plausible tumor evolution trajectories than GAN-based baselines, supporting the potential of VT as a tool for in-silico treatment monitoring and adaptive radiotherapy research in NSCLC.
IVAug 16, 2024
Modeling the Neonatal Brain Development Using Implicit Neural RepresentationsFlorentin Bieder, Paul Friedrich, Hélène Corbaz et al.
The human brain undergoes rapid development during the third trimester of pregnancy. In this work, we model the neonatal development of the infant brain in this age range. As a basis, we use MR images of preterm- and term-birth neonates from the developing human connectome project (dHCP). We propose a neural network, specifically an implicit neural representation (INR), to predict 2D- and 3D images of varying time points. In order to model a subject-specific development process, it is necessary to disentangle the age from the subjects' identity in the latent space of the INR. We propose two methods, Subject Specific Latent Vectors (SSL) and Stochastic Global Latent Augmentation (SGLA), enabling this disentanglement. We perform an analysis of the results and compare our proposed model to an age-conditioned denoising diffusion model as a baseline. We also show that our method can be applied in a memory-efficient way, which is especially important for 3D data.
CVMar 18, 2024Code
Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly DetectionJulia Wolleb, Florentin Bieder, Paul Friedrich et al.
The high performance of denoising diffusion models for image generation has paved the way for their application in unsupervised medical anomaly detection. As diffusion-based methods require a lot of GPU memory and have long sampling times, we present a novel and fast unsupervised anomaly detection approach based on latent Bernoulli diffusion models. We first apply an autoencoder to compress the input images into a binary latent representation. Next, a diffusion model that follows a Bernoulli noise schedule is employed to this latent space and trained to restore binary latent representations from perturbed ones. The binary nature of this diffusion model allows us to identify entries in the latent space that have a high probability of flipping their binary code during the denoising process, which indicates out-of-distribution data. We propose a masking algorithm based on these probabilities, which improves the anomaly detection scores. We achieve state-of-the-art performance compared to other diffusion-based unsupervised anomaly detection algorithms while significantly reducing sampling time and memory consumption. The code is available at https://github.com/JuliaWolleb/Anomaly_berdiff.
CVMar 2
AutoFFS: Adversarial Deformations for Facial Feminization Surgery PlanningPaul Friedrich, Florentin Bieder, Florian M. Thieringer et al.
Facial feminization surgery (FFS) is a key component of gender affirmation for transgender and gender diverse patients, aiming to reshape craniofacial structures toward a female morphology. Current surgical planning procedures largely rely on subjective clinical assessment, lacking quantitative and reproducible anatomical guidance. We therefore propose AutoFFS, a novel data-driven framework that generates counterfactual skull morphologies through adversarial free-form deformations. Our method performs a deformation-based targeted adversarial attack on an ensemble of pre-trained binary sex classifiers that learned sexual dimorphism, effectively transforming individual skull shapes toward the target sex. The generated counterfactual skull morphologies provide a quantitative foundation for preoperative planning in FFS, driving advances in this largely overlooked patient group. We validate our approach through classifier-based evaluation and a human perceptual study, confirming that the generated morphologies exhibit target sex characteristics.
CVDec 15, 2025
End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine SurgeryLorenzo Pettinari, Sidaty El Hadramy, Michael Wehrli et al.
Purpose: Intraoperative navigation in spine surgery demands millimeter-level accuracy. Current systems based on intraoperative radiographic imaging and bone-anchored markers are invasive, radiation-intensive and workflow disruptive. Recent markerless RGB-D registration methods offer a promising alternative, but existing approaches rely on weak segmentation labels to isolate relevant anatomical structures, which can propagate errors throughout registration. Methods: We present End2Reg an end-to-end deep learning framework that jointly optimizes segmentation and registration, eliminating the need for weak segmentation labels and manual steps. The network learns segmentation masks specifically optimized for registration, guided solely by the registration objective without direct segmentation supervision. Results: The proposed framework achieves state-of-the-art performance on ex- and in-vivo benchmarks, reducing median Target Registration Error by 32% to 1.83mm and mean Root Mean Square Error by 45% to 3.95mm, respectively. An ablation study confirms that end-to-end optimization significantly improves registration accuracy. Conclusion: The presented end-to-end RGB-D registration pipeline removes dependency on weak labels and manual steps, advancing towards fully automatic, markerless intraoperative navigation. Code and interactive visualizations are available at: https://lorenzopettinari.github.io/end-2-reg/.
IVJul 17, 2025Code
fastWDM3D: Fast and Accurate 3D Healthy Tissue InpaintingAlicia Durrer, Florentin Bieder, Paul Friedrich et al.
Healthy tissue inpainting has significant applications, including the generation of pseudo-healthy baselines for tumor growth models and the facilitation of image registration. In previous editions of the BraTS Local Synthesis of Healthy Brain Tissue via Inpainting Challenge, denoising diffusion probabilistic models (DDPMs) demonstrated qualitatively convincing results but suffered from low sampling speed. To mitigate this limitation, we adapted a 2D image generation approach, combining DDPMs with generative adversarial networks (GANs) and employing a variance-preserving noise schedule, for the task of 3D inpainting. Our experiments showed that the variance-preserving noise schedule and the selected reconstruction losses can be effectively utilized for high-quality 3D inpainting in a few time steps without requiring adversarial training. We applied our findings to a different architecture, a 3D wavelet diffusion model (WDM3D) that does not include a GAN component. The resulting model, denoted as fastWDM3D, obtained a SSIM of 0.8571, a MSE of 0.0079, and a PSNR of 22.26 on the BraTS inpainting test set. Remarkably, it achieved these scores using only two time steps, completing the 3D inpainting process in 1.81 s per image. When compared to other DDPMs used for healthy brain tissue inpainting, our model is up to 800 x faster while still achieving superior performance metrics. Our proposed method, fastWDM3D, represents a promising approach for fast and accurate healthy tissue inpainting. Our code is available at https://github.com/AliciaDurrer/fastWDM3D.
IVNov 2, 2025
Deep Generative Models for Enhanced Vitreous OCT ImagingSimone Sarrocco, Philippe C. Cattin, Peter M. Maloca et al.
Purpose: To evaluate deep learning (DL) models for enhancing vitreous optical coherence tomography (OCT) image quality and reducing acquisition time. Methods: Conditional Denoising Diffusion Probabilistic Models (cDDPMs), Brownian Bridge Diffusion Models (BBDMs), U-Net, Pix2Pix, and Vector-Quantised Generative Adversarial Network (VQ-GAN) were used to generate high-quality spectral-domain (SD) vitreous OCT images. Inputs were SD ART10 images, and outputs were compared to pseudoART100 images obtained by averaging ten ART10 images per eye location. Model performance was assessed using image quality metrics and Visual Turing Tests, where ophthalmologists ranked generated images and evaluated anatomical fidelity. The best model's performance was further tested within the manually segmented vitreous on newly acquired data. Results: U-Net achieved the highest Peak Signal-to-Noise Ratio (PSNR: 30.230) and Structural Similarity Index Measure (SSIM: 0.820), followed by cDDPM. For Learned Perceptual Image Patch Similarity (LPIPS), Pix2Pix (0.697) and cDDPM (0.753) performed best. In the first Visual Turing Test, cDDPM ranked highest (3.07); in the second (best model only), cDDPM achieved a 32.9% fool rate and 85.7% anatomical preservation. On newly acquired data, cDDPM generated vitreous regions more similar in PSNR to the ART100 reference than true ART1 or ART10 B-scans and achieved higher PSNR on whole images when conditioned on ART1 than ART10. Conclusions: Results reveal discrepancies between quantitative metrics and clinical evaluation, highlighting the need for combined assessment. cDDPM showed strong potential for generating clinically meaningful vitreous OCT images while reducing acquisition time fourfold. Translational Relevance: cDDPMs show promise for clinical integration, supporting faster, higher-quality vitreous imaging. Dataset and code will be made publicly available.
IVAug 22, 2025Code
Towards Diagnostic Quality Flat-Panel Detector CT Imaging Using Diffusion ModelsHélène Corbaz, Anh Nguyen, Victor Schulze-Zachau et al.
Patients undergoing a mechanical thrombectomy procedure usually have a multi-detector CT (MDCT) scan before and after the intervention. The image quality of the flat panel detector CT (FDCT) present in the intervention room is generally much lower than that of a MDCT due to significant artifacts. However, using only FDCT images could improve patient management as the patient would not need to be moved to the MDCT room. Several studies have evaluated the potential use of FDCT imaging alone and the time that could be saved by acquiring the images before and/or after the intervention only with the FDCT. This study proposes using a denoising diffusion probabilistic model (DDPM) to improve the image quality of FDCT scans, making them comparable to MDCT scans. Clinicans evaluated FDCT, MDCT, and our model's predictions for diagnostic purposes using a questionnaire. The DDPM eliminated most artifacts and improved anatomical visibility without reducing bleeding detection, provided that the input FDCT image quality is not too low. Our code can be found on github.
IVFeb 20, 2025Code
MedFuncta: A Unified Framework for Learning Efficient Medical Neural FieldsPaul Friedrich, Florentin Bieder, Julian McGinnis et al.
Research in medical imaging primarily focuses on discrete data representations that poorly scale with grid resolution and fail to capture the often continuous nature of the underlying signal. Neural Fields (NFs) offer a powerful alternative by modeling data as continuous functions. While single-instance NFs have successfully been applied in medical contexts, extending them to large-scale medical datasets remains an open challenge. We therefore introduce MedFuncta, a unified framework for large-scale NF training on diverse medical signals. Building on Functa, our approach encodes data into a unified representation, namely a 1D latent vector, that modulates a shared, meta-learned NF, enabling generalization across a dataset. We revisit common design choices, introducing a non-constant frequency parameter $ω$ in widely used SIREN activations, and establish a connection between this $ω$-schedule and layer-wise learning rates, relating our findings to recent work in theoretical learning dynamics. We additionally introduce a scalable meta-learning strategy for shared network learning that employs sparse supervision during training, thereby reducing memory consumption and computational overhead while maintaining competitive performance. Finally, we evaluate MedFuncta across a diverse range of medical datasets and show how to solve relevant downstream tasks on our neural data representation. To promote further research in this direction, we release our code, model weights and the first large-scale dataset - MedNF - containing > 500 k latent vectors for multi-instance medical NFs.
IVFeb 29, 2024
WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image SynthesisPaul Friedrich, Julia Wolleb, Florentin Bieder et al.
Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model's applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling 3D diffusion models to high resolutions and can be trained on a single \SI{40}{\giga\byte} GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 \times 128 \times 128$ demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to recent GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of generating high-quality images at a resolution of $256 \times 256 \times 256$, outperforming all comparing methods.
IVMar 21, 2024
Denoising Diffusion Models for 3D Healthy Brain Tissue InpaintingAlicia Durrer, Julia Wolleb, Florentin Bieder et al.
Monitoring diseases that affect the brain's structural integrity requires automated analysis of magnetic resonance (MR) images, e.g., for the evaluation of volumetric changes. However, many of the evaluation tools are optimized for analyzing healthy tissue. To enable the evaluation of scans containing pathological tissue, it is therefore required to restore healthy tissue in the pathological areas. In this work, we explore and extend denoising diffusion models for consistent inpainting of healthy 3D brain tissue. We modify state-of-the-art 2D, pseudo-3D, and 3D methods working in the image space, as well as 3D latent and 3D wavelet diffusion models, and train them to synthesize healthy brain tissue. Our evaluation shows that the pseudo-3D model performs best regarding the structural-similarity index, peak signal-to-noise ratio, and mean squared error. To emphasize the clinical relevance, we fine-tune this model on data containing synthetic MS lesions and evaluate it on a downstream brain tissue segmentation task, whereby it outperforms the established FMRIB Software Library (FSL) lesion-filling method.
IVOct 23, 2024
Deep Generative Models for 3D Medical Image SynthesisPaul Friedrich, Yannik Frisch, Philippe C. Cattin
Deep generative modeling has emerged as a powerful tool for synthesizing realistic medical images, driving advances in medical image analysis, disease diagnosis, and treatment planning. This chapter explores various deep generative models for 3D medical image synthesis, with a focus on Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Denoising Diffusion Models (DDMs). We discuss the fundamental principles, recent advances, as well as strengths and weaknesses of these models and examine their applications in clinically relevant problems, including unconditional and conditional generation tasks like image-to-image translation and image reconstruction. We additionally review commonly used evaluation metrics for assessing image fidelity, diversity, utility, and privacy and provide an overview of current challenges in the field.
IVFeb 27, 2024
Denoising Diffusion Models for Inpainting of Healthy Brain TissueAlicia Durrer, Philippe C. Cattin, Julia Wolleb
This paper is a contribution to the "BraTS 2023 Local Synthesis of Healthy Brain Tissue via Inpainting Challenge". The task of this challenge is to transform tumor tissue into healthy tissue in brain magnetic resonance (MR) images. This idea originates from the problem that MR images can be evaluated using automatic processing tools, however, many of these tools are optimized for the analysis of healthy tissue. By solving the given inpainting task, we enable the automatic analysis of images featuring lesions, and further downstream tasks. Our approach builds on denoising diffusion probabilistic models. We use a 2D model that is trained using slices in which healthy tissue was cropped out and is learned to be inpainted again. This allows us to use the ground truth healthy tissue during training. In the sampling stage, we replace the slices containing diseased tissue in the original 3D volume with the slices containing the healthy tissue inpainting. With our approach, we achieve comparable results to the competing methods. On the validation set our model achieves a mean SSIM of 0.7804, a PSNR of 20.3525 and a MSE of 0.0113. In future we plan to extend our 2D model to a 3D model, allowing to inpaint the region of interest as a whole without losing context information of neighboring slices.
IVNov 26, 2024
cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image SynthesisPaul Friedrich, Alicia Durrer, Julia Wolleb et al.
This paper contributes to the "BraTS 2024 Brain MR Image Synthesis Challenge" and presents a conditional Wavelet Diffusion Model (cWDM) for directly solving a paired image-to-image translation task on high-resolution volumes. While deep learning-based brain tumor segmentation models have demonstrated clear clinical utility, they typically require MR scans from various modalities (T1, T1ce, T2, FLAIR) as input. However, due to time constraints or imaging artifacts, some of these modalities may be missing, hindering the application of well-performing segmentation algorithms in clinical routine. To address this issue, we propose a method that synthesizes one missing modality image conditioned on three available images, enabling the application of downstream segmentation models. We treat this paired image-to-image translation task as a conditional generation problem and solve it by combining a Wavelet Diffusion Model for high-resolution 3D image synthesis with a simple conditioning strategy. This approach allows us to directly apply our model to full-resolution volumes, avoiding artifacts caused by slice- or patch-wise data processing. While this work focuses on a specific application, the presented method can be applied to all kinds of paired image-to-image translation problems, such as CT $\leftrightarrow$ MR and MR $\leftrightarrow$ PET translation, or mask-conditioned anatomically guided image generation.
LGJul 7, 2025
Conditional Graph Neural Network for Predicting Soft Tissue Deformation and ForcesMadina Kojanazarova, Florentin Bieder, Robin Sandkühler et al.
Soft tissue simulation in virtual environments is becoming increasingly important for medical applications. However, the high deformability of soft tissue poses significant challenges. Existing methods rely on segmentation, meshing and estimation of stiffness properties of tissues. In addition, the integration of haptic feedback requires precise force estimation to enable a more immersive experience. We introduce a novel data-driven model, a conditional graph neural network (cGNN) to tackle this complexity. Our model takes surface points and the location of applied forces, and is specifically designed to predict the deformation of the points and the forces exerted on them. We trained our model on experimentally collected surface tracking data of a soft tissue phantom and used transfer learning to overcome the data scarcity by initially training it with mass-spring simulations and fine-tuning it with the experimental data. This approach improves the generalisation capability of the model and enables accurate predictions of tissue deformations and corresponding interaction forces. The results demonstrate that the model can predict deformations with a distance error of 0.35$\pm$0.03 mm for deformations up to 30 mm and the force with an absolute error of 0.37$\pm$0.05 N for forces up to 7.5 N. Our data-driven approach presents a promising solution to the intricate challenge of simulating soft tissues within virtual environments. Beyond its applicability in medical simulations, this approach holds the potential to benefit various fields where realistic soft tissue simulations are required.
IVOct 31, 2024
Denoising Diffusion Models for Anomaly Localization in Medical ImagesCosmin I. Bercea, Philippe C. Cattin, Julia A. Schnabel et al.
This chapter explores anomaly localization in medical images using denoising diffusion models. After providing a brief methodological background of these models, including their application to image reconstruction and their conditioning using guidance mechanisms, we provide an overview of available datasets and evaluation metrics suitable for their application to anomaly localization in medical images. In this context, we discuss supervision schemes ranging from fully supervised segmentation to semi-supervised, weakly supervised, self-supervised, and unsupervised methods, and provide insights into the effectiveness and limitations of these approaches. Furthermore, we highlight open challenges in anomaly localization, including detection bias, domain shift, computational cost, and model interpretability. Our goal is to provide an overview of the current state of the art in the field, outline research gaps, and highlight the potential of diffusion models for robust anomaly localization in medical images.
CVMar 6
3D CBCT Artefact Removal Using Perpendicular Score-Based Diffusion ModelsSusanne Schaub, Florentin Bieder, Matheus L. Oliveira et al.
Cone-beam computed tomography (CBCT) is a widely used 3D imaging technique in dentistry, offering high-resolution images while minimising radiation exposure for patients. However, CBCT is highly susceptible to artefacts arising from high-density objects such as dental implants, which can compromise image quality and diagnostic accuracy. To reduce artefacts, implant inpainting in the sequence of projections plays a crucial role in many artefact reduction approaches. Recently, diffusion models have achieved state-of-the-art results in image generation and have widely been applied to image inpainting tasks. However, to our knowledge, existing diffusion-based methods for implant inpainting operate on independent 2D projections. This approach neglects the correlations among individual projections, resulting in inconsistencies in the reconstructed images. To address this, we propose a 3D dental implant inpainting approach based on perpendicular score-based diffusion models, each trained in two different planes and operating in the projection domain. The 3D distribution of the projection series is modelled by combining the two 2D score-based diffusion models in the sampling scheme. Our results demonstrate the method's effectiveness in producing high-quality, artefact-reduced 3D CBCT images, making it a promising solution for improving clinical imaging.
LGOct 1, 2025
TabINR: An Implicit Neural Representation Framework for Tabular Data ImputationVincent Ochs, Florentin Bieder, Sidaty el Hadramy et al.
Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the applicability of downstream models, and while simple imputing strategies tend to introduce bias or distort the underlying data distribution, we require imputers that provide high-quality imputations, are robust across dataset sizes and yield fast inference. We therefore introduce TabINR, an auto-decoder based Implicit Neural Representation (INR) framework that models tables as neural functions. Building on recent advances in generalizable INRs, we introduce learnable row and feature embeddings that effectively deal with the discrete structure of tabular data and can be inferred from partial observations, enabling instance adaptive imputations without modifying the trained model. We evaluate our framework across a diverse range of twelve real-world datasets and multiple missingness mechanisms, demonstrating consistently strong imputation accuracy, mostly matching or outperforming classical (KNN, MICE, MissForest) and deep learning based models (GAIN, ReMasker), with the clearest gains on high-dimensional datasets.
CVSep 22, 2025
CPT-4DMR: Continuous sPatial-Temporal Representation for 4D-MRI ReconstructionXinyang Wu, Muheng Li, Xia Li et al.
Four-dimensional MRI (4D-MRI) is an promising technique for capturing respiratory-induced motion in radiation therapy planning and delivery. Conventional 4D reconstruction methods, which typically rely on phase binning or separate template scans, struggle to capture temporal variability, complicate workflows, and impose heavy computational loads. We introduce a neural representation framework that considers respiratory motion as a smooth, continuous deformation steered by a 1D surrogate signal, completely replacing the conventional discrete sorting approach. The new method fuses motion modeling with image reconstruction through two synergistic networks: the Spatial Anatomy Network (SAN) encodes a continuous 3D anatomical representation, while a Temporal Motion Network (TMN), guided by Transformer-derived respiratory signals, produces temporally consistent deformation fields. Evaluation using a free-breathing dataset of 19 volunteers demonstrates that our template- and phase-free method accurately captures both regular and irregular respiratory patterns, while preserving vessel and bronchial continuity with high anatomical fidelity. The proposed method significantly improves efficiency, reducing the total processing time from approximately five hours required by conventional discrete sorting methods to just 15 minutes of training. Furthermore, it enables inference of each 3D volume in under one second. The framework accurately reconstructs 3D images at any respiratory state, achieves superior performance compared to conventional methods, and demonstrates strong potential for application in 4D radiation therapy planning and real-time adaptive treatment.
CVAug 8, 2025
Towards MR-Based Trochleoplasty PlanningMichael Wehrli, Alicia Durrer, Paul Friedrich et al.
To treat Trochlear Dysplasia (TD), current approaches rely mainly on low-resolution clinical Magnetic Resonance (MR) scans and surgical intuition. The surgeries are planned based on surgeons experience, have limited adoption of minimally invasive techniques, and lead to inconsistent outcomes. We propose a pipeline that generates super-resolved, patient-specific 3D pseudo-healthy target morphologies from conventional clinical MR scans. First, we compute an isotropic super-resolved MR volume using an Implicit Neural Representation (INR). Next, we segment femur, tibia, patella, and fibula with a multi-label custom-trained network. Finally, we train a Wavelet Diffusion Model (WDM) to generate pseudo-healthy target morphologies of the trochlear region. In contrast to prior work producing pseudo-healthy low-resolution 3D MR images, our approach enables the generation of sub-millimeter resolved 3D shapes compatible for pre- and intraoperative use. These can serve as preoperative blueprints for reshaping the femoral groove while preserving the native patella articulation. Furthermore, and in contrast to other work, we do not require a CT for our pipeline - reducing the amount of radiation. We evaluated our approach on 25 TD patients and could show that our target morphologies significantly improve the sulcus angle (SA) and trochlear groove depth (TGD). The code and interactive visualization are available at https://wehrlimi.github.io/sr-3d-planning/.
CVJul 17, 2025
cIDIR: Conditioned Implicit Neural Representation for Regularized Deformable Image RegistrationSidaty El Hadramy, Oumeymah Cherkaoui, Philippe C. Cattin
Regularization is essential in deformable image registration (DIR) to ensure that the estimated Deformation Vector Field (DVF) remains smooth, physically plausible, and anatomically consistent. However, fine-tuning regularization parameters in learning-based DIR frameworks is computationally expensive, often requiring multiple training iterations. To address this, we propose cIDI, a novel DIR framework based on Implicit Neural Representations (INRs) that conditions the registration process on regularization hyperparameters. Unlike conventional methods that require retraining for each regularization hyperparameter setting, cIDIR is trained over a prior distribution of these hyperparameters, then optimized over the regularization hyperparameters by using the segmentations masks as an observation. Additionally, cIDIR models a continuous and differentiable DVF, enabling seamless integration of advanced regularization techniques via automatic differentiation. Evaluated on the DIR-LAB dataset, $\operatorname{cIDIR}$ achieves high accuracy and robustness across the dataset.
MED-PHJun 6, 2024
Deep-Learning Approach for Tissue Classification using Acoustic Waves during Ablation with an Er:YAG Laser (Updated)Carlo Seppi, Philippe C. Cattin
Today's mechanical tools for bone cutting (osteotomy) cause mechanical trauma that prolongs the healing process. Medical device manufacturers aim to minimize this trauma, with minimally invasive surgery using laser cutting as one innovation. This method ablates tissue using laser light instead of mechanical tools, reducing post-surgery healing time. A reliable feedback system is crucial during laser surgery to prevent damage to surrounding tissues. We propose a tissue classification method analyzing acoustic waves generated during laser ablation, demonstrating its applicability in an ex-vivo experiment. The ablation process with a microsecond pulsed Er:YAG laser produces acoustic waves, acquired with an air-coupled transducer. These waves were used to classify five porcine tissue types: hard bone, soft bone, muscle, fat, and skin. For automated tissue classification, we compared five Neural Network (NN) approaches: a one-dimensional Convolutional Neural Network (CNN) with time-dependent input, a Fully-connected Neural Network (FcNN) with either the frequency spectrum or principal components of the frequency spectrum as input, and a combination of a CNN and an FcNN with time-dependent data and its frequency spectrum as input. Consecutive acoustic waves were used to improve classification accuracy. Grad-Cam identified the activation map of the frequencies, showing low frequencies as the most important for this task. Our results indicated that combining time-dependent data with its frequency spectrum achieved the highest classification accuracy (65.5%-75.5%). We also found that using the frequency spectrum alone was sufficient, with no additional benefit from applying Principal Components Analysis (PCA).
IVMay 15, 2023
The Brain Tumor Segmentation (BraTS) Challenge: Local Synthesis of Healthy Brain Tissue via InpaintingFlorian Kofler, Felix Meissen, Felix Steinbauer et al.
A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with an already pathological scan. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantee for images featuring lesions. Examples include, but are not limited to, algorithms for brain anatomy parcellation, tissue segmentation, and brain extraction. To solve this dilemma, we introduce the BraTS inpainting challenge. Here, the participants explore inpainting techniques to synthesize healthy brain scans from lesioned ones. The following manuscript contains the task formulation, dataset, and submission procedure. Later, it will be updated to summarize the findings of the challenge. The challenge is organized as part of the ASNR-BraTS MICCAI challenge.
CVDec 6, 2021
Diffusion Models for Implicit Image Segmentation EnsemblesJulia Wolleb, Robin Sandkühler, Florentin Bieder et al.
Diffusion models have shown impressive performance for generative modelling of images. In this paper, we present a novel semantic segmentation method based on diffusion models. By modifying the training and sampling scheme, we show that diffusion models can perform lesion segmentation of medical images. To generate an image specific segmentation, we train the model on the ground truth segmentation, and use the image as a prior during training and in every step during the sampling process. With the given stochastic sampling process, we can generate a distribution of segmentation masks. This property allows us to compute pixel-wise uncertainty maps of the segmentation, and allows an implicit ensemble of segmentations that increases the segmentation performance. We evaluate our method on the BRATS2020 dataset for brain tumor segmentation. Compared to state-of-the-art segmentation models, our approach yields good segmentation results and, additionally, detailed uncertainty maps.
CVOct 13, 2021
Learn to Ignore: Domain Adaptation for Multi-Site MRI AnalysisJulia Wolleb, Robin Sandkühler, Florentin Bieder et al.
The limited availability of large image datasets, mainly due to data privacy and differences in acquisition protocols or hardware, is a significant issue in the development of accurate and generalizable machine learning methods in medicine. This is especially the case for Magnetic Resonance (MR) images, where different MR scanners introduce a bias that limits the performance of a machine learning model. We present a novel method that learns to ignore the scanner-related features present in MR images, by introducing specific additional constraints on the latent space. We focus on a real-world classification scenario, where only a small dataset provides images of all classes. Our method \textit{Learn to Ignore (L2I)} outperforms state-of-the-art domain adaptation methods on a multi-site MR dataset for a classification task between multiple sclerosis patients and healthy controls.
CVMar 2, 2021
Comparison of Methods Generalizing Max- and Average-PoolingFlorentin Bieder, Robin Sandkühler, Philippe C. Cattin
Max- and average-pooling are the most popular pooling methods for downsampling in convolutional neural networks. In this paper, we compare different pooling methods that generalize both max- and average-pooling. Furthermore, we propose another method based on a smooth approximation of the maximum function and put it into context with related methods. For the comparison, we use a VGG16 image classification network and train it on a large dataset of natural high-resolution images (Google Open Images v5). The results show that none of the more sophisticated methods perform significantly better in this classification task than standard max- or average-pooling.
IVOct 14, 2020
3D Segmentation Networks for Excessive Numbers of Classes: Distinct Bone Segmentation in Upper BodiesEva Schnider, Antal Horváth, Georg Rauter et al.
Segmentation of distinct bones plays a crucial role in diagnosis, planning, navigation, and the assessment of bone metastasis. It supplies semantic knowledge to visualisation tools for the planning of surgical interventions and the education of health professionals. Fully supervised segmentation of 3D data using Deep Learning methods has been extensively studied for many tasks but is usually restricted to distinguishing only a handful of classes. With 125 distinct bones, our case includes many more labels than typical 3D segmentation tasks. For this reason, the direct adaptation of most established methods is not possible. This paper discusses the intricacies of training a 3D segmentation network in a many-label setting and shows necessary modifications in network architecture, loss function, and data augmentation. As a result, we demonstrate the robustness of our method by automatically segmenting over one hundred distinct bones simultaneously in an end-to-end learnt fashion from a CT-scan.
IVJul 28, 2020
DeScarGAN: Disease-Specific Anomaly Detection with Weak SupervisionJulia Wolleb, Robin Sandkühler, Philippe C. Cattin
Anomaly detection and localization in medical images is a challenging task, especially when the anomaly exhibits a change of existing structures, e.g., brain atrophy or changes in the pleural space due to pleural effusions. In this work, we present a weakly supervised and detail-preserving method that is able to detect structural changes of existing anatomical structures. In contrast to standard anomaly detection methods, our method extracts information about the disease characteristics from two groups: a group of patients affected by the same disease and a healthy control group. Together with identity-preserving mechanisms, this enables our method to extract highly disease-specific characteristics for a more detailed detection of structural changes. We designed a specific synthetic data set to evaluate and compare our method against state-of-the-art anomaly detection methods. Finally, we show the performance of our method on chest X-ray images. Our method called DeScarGAN outperforms other anomaly detection methods on the synthetic data set and by visual inspection on the chest X-ray image data set.
IVAug 26, 2019
Accelerated Motion-Aware MR Imaging via Motion Prediction from K-Space CenterChristoph Jud, Damien Nguyen, Alina Giger et al.
Motion has been a challenge for magnetic resonance (MR) imaging ever since the MR has been invented. Especially in volumetric imaging of thoracic and abdominal organs, motion-awareness is essential for reducing motion artifacts in the final image. A recently proposed MR imaging approach copes with motion by observing the motion patterns during the acquisition. Repetitive scanning of the k-space center region enables the extraction of the patient motion while acquiring the remaining part of the k-space. Due to highly redundant measurements of the center, the required scanning time of over 11 min and the reconstruction time of 2 h exceed clinical applicability though. We propose an accelerated motion-aware MR imaging method where the motion is inferred from small-sized k-space center patches and an initial training phase during which the characteristic movements are modeled. Thereby, acquisition times are reduced by a factor of almost 2 and reconstruction times by two orders of magnitude. Moreover, we improve the existing motion-aware approach with a systematic temporal shift correction to achieve a sharper image reconstruction. We tested our method on 12 volunteers and scanned their lungs and abdomen under free breathing. We achieved equivalent to higher reconstruction quality using the motion-prediction compared to the slower existing approach.
CVJun 7, 2019
Recurrent Registration Neural Networks for Deformable Image RegistrationRobin Sandkühler, Simon Andermatt, Grzegorz Bauman et al.
Parametric spatial transformation models have been successfully applied to image registration tasks. In such models, the transformation of interest is parameterized by a fixed set of basis functions as for example B-splines. Each basis function is located on a fixed regular grid position among the image domain, because the transformation of interest is not known in advance. As a consequence, not all basis functions will necessarily contribute to the final transformation which results in a non-compact representation of the transformation. We reformulate the pairwise registration problem as a recursive sequence of successive alignments. For each element in the sequence, a local deformation defined by its position, shape, and weight is computed by our recurrent registration neural network. The sum of all local deformations yield the final spatial alignment of both images. Formulating the registration problem in this way allows the network to detect non-aligned regions in the images and to learn how to locally refine the registration properly. In contrast to current non-sequence-based registration methods, our approach iteratively applies local spatial deformations to the images until the desired registration accuracy is achieved. We trained our network on 2D magnetic resonance images of the lung and compared our method to a standard parametric B-spline registration. The experiments show, that our method performs on par for the accuracy but yields a more compact representation of the transformation. Furthermore, we achieve a speedup of around 15 compared to the B-spline registration.
CVJun 26, 2018
AirLab: Autograd Image Registration LaboratoryRobin Sandkühler, Christoph Jud, Simon Andermatt et al.
Medical image registration is an active research topic and forms a basis for many medical image analysis tasks. Although image registration is a rather general concept specialized methods are usually required to target a specific registration problem. The development and implementation of such methods has been tough so far as the gradient of the objective has to be computed. Also, its evaluation has to be performed preferably on a GPU for larger images and for more complex transformation models and regularization terms. This hinders researchers from rapid prototyping and poses hurdles to reproduce research results. There is a clear need for an environment which hides this complexity to put the modeling and the experimental exploration of registration methods into the foreground. With the "Autograd Image Registration Laboratory" (AIRLab), we introduce an open laboratory for image registration tasks, where the analytic gradients of the objective function are computed automatically and the device where the computations are performed, on a CPU or a GPU, is transparent. It is meant as a laboratory for researchers and developers enabling them to rapidly try out new ideas for registering images and to reproduce registration results which have already been published. AIRLab is implemented in Python using PyTorch as tensor and optimization library and SimpleITK for basic image IO. Therefore, it profits from recent advances made by the machine learning community concerning optimization and deep neural network models. The presented draft of this paper outlines AIRLab with first code snippets and performance analyses. A more exhaustive introduction will follow as a final version soon.
CVAug 9, 2017
Multi-dimensional Gated Recurrent Units for Automated Anatomical Landmark LocalizationSimon Andermatt, Simon Pezold, Michael Amann et al.
We present an automated method for localizing an anatomical landmark in three-dimensional medical images. The method combines two recurrent neural networks in a coarse-to-fine approach: The first network determines a candidate neighborhood by analyzing the complete given image volume. The second network localizes the actual landmark precisely and accurately in the candidate neighborhood. Both networks take advantage of multi-dimensional gated recurrent units in their main layers, which allow for high model complexity with a comparatively small set of parameters. We localize the medullopontine sulcus in 3D magnetic resonance images of the head and neck. We show that the proposed approach outperforms similar localization techniques both in terms of mean distance in millimeters and voxels w.r.t. manual labelings of the data. With a mean localization error of 1.7 mm, the proposed approach performs on par with neurological experts, as we demonstrate in an interrater comparison.
HCMay 22, 2017
Eye Tracker Accuracy: Quantitative Evaluation of the Invisible Eye Center LocationStephan Wyder, Philippe C. Cattin
Purpose. We present a new method to evaluate the accuracy of an eye tracker based eye localization system. Measuring the accuracy of an eye tracker's primary intention, the estimated point of gaze, is usually done with volunteers and a set of fixation points used as ground truth. However, verifying the accuracy of the location estimate of a volunteer's eye center in 3D space is not easily possible. This is because the eye center is an intangible point hidden by the iris. Methods. We evaluate the eye location accuracy by using an eye phantom instead of eyes of volunteers. For this, we developed a testing stage with a realistic artificial eye and a corresponding kinematic model, which we trained with μCT data. This enables us to precisely evaluate the eye location estimate of an eye tracker. Results. We show that the proposed testing stage with the corresponding kinematic model is suitable for such a validation. Further, we evaluate a particular eye tracker based navigation system and show that this system is able to successfully determine the eye center with sub-millimeter accuracy. Conclusions. We show the suitability of the evaluated eye tracker for eye interventions, using the proposed testing stage and the corresponding kinematic model. The results further enable specific enhancement of the navigation system to potentially get even better results.