Nils D. Forkert

CV
h-index47
13papers
333citations
Novelty42%
AI Score52

13 Papers

CVJul 17, 2022
Self-Supervised-RCNN for Medical Image Segmentation with Limited Data Annotation

Banafshe Felfeliyan, Abhilash Hareendranathan, Gregor Kuntze et al.

Many successful methods developed for medical image analysis that are based on machine learning use supervised learning approaches, which often require large datasets annotated by experts to achieve high accuracy. However, medical data annotation is time-consuming and expensive, especially for segmentation tasks. To solve the problem of learning with limited labeled medical image data, an alternative deep learning training strategy based on self-supervised pretraining on unlabeled MRI scans is proposed in this work. Our pretraining approach first, randomly applies different distortions to random areas of unlabeled images and then predicts the type of distortions and loss of information. To this aim, an improved version of Mask-RCNN architecture has been adapted to localize the distortion location and recover the original image pixels. The effectiveness of the proposed method for segmentation tasks in different pre-training and fine-tuning scenarios is evaluated based on the Osteoarthritis Initiative dataset. Using this self-supervised pretraining method improved the Dice score by 20% compared to training from scratch. The proposed self-supervised learning is simple, effective, and suitable for different ranges of medical image analysis tasks including anomaly detection, segmentation, and classification.

IVAug 20, 2024
ISLES'24: Final Infarct Prediction with Multimodal Imaging and Clinical Data. Where Do We Stand?

Ezequiel de la Rosa, Ruisheng Su, Mauricio Reyes et al.

Accurate estimation of brain infarction (i.e., irreversibly damaged tissue) is critical for guiding treatment decisions in acute ischemic stroke. Reliable infarct prediction informs key clinical interventions, including the need for patient transfer to comprehensive stroke centers, the potential benefit of additional reperfusion attempts during mechanical thrombectomy, decisions regarding secondary neuroprotective treatments, and ultimately, prognosis of clinical outcomes. This work introduces the Ischemic Stroke Lesion Segmentation (ISLES) 2024 challenge, which focuses on the prediction of final infarct volumes from pre-interventional acute stroke imaging and clinical data. ISLES24 provides a comprehensive, multimodal setting where participants can leverage all clinically and practically available data, including full acute CT imaging, sub-acute follow-up MRI, and structured clinical information, across a train set of 150 cases. On the hidden test set of 98 cases, the top-performing model, a multimodal nnU-Net-based architecture, achieved a Dice score of 0.285 (+/- 0.213) and an absolute volume difference of 21.2 (+/- 37.2) mL, underlining the significant challenges posed by this task and the need for further advances in multimodal learning. This work makes two primary contributions: first, we establish a standardized, clinically realistic benchmark for post-treatment infarct prediction, enabling systematic evaluation of multimodal algorithmic strategies on a longitudinal stroke dataset; second, we analyze current methodological limitations and outline key research directions to guide the development of next-generation infarct prediction models.

CVNov 3, 2023
Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging

Emma A. M. Stanley, Raissa Souza, Anthony Winder et al.

Artificial intelligence (AI) models trained using medical images for clinical tasks often exhibit bias in the form of disparities in performance between subgroups. Since not all sources of biases in real-world medical imaging data are easily identifiable, it is challenging to comprehensively assess how those biases are encoded in models, and how capable bias mitigation methods are at ameliorating performance disparities. In this article, we introduce a novel analysis framework for systematically and objectively investigating the impact of biases in medical images on AI models. We developed and tested this framework for conducting controlled in silico trials to assess bias in medical imaging AI using a tool for generating synthetic magnetic resonance images with known disease effects and sources of bias. The feasibility is showcased by using three counterfactual bias scenarios to measure the impact of simulated bias effects on a convolutional neural network (CNN) classifier and the efficacy of three bias mitigation strategies. The analysis revealed that the simulated biases resulted in expected subgroup performance disparities when the CNN was trained on the synthetic datasets. Moreover, reweighing was identified as the most successful bias mitigation strategy for this setup, and we demonstrated how explainable AI methods can aid in investigating the manifestation of bias in the model using this framework. Developing fair AI models is a considerable challenge given that many and often unknown sources of biases can be present in medical imaging datasets. In this work, we present a novel methodology to objectively study the impact of biases and mitigation strategies on deep learning pipelines, which can support the development of clinical AI that is robust and responsible.

CVSep 16, 2022
Weakly Supervised Medical Image Segmentation With Soft Labels and Noise Robust Loss

Banafshe Felfeliyan, Abhilash Hareendranathan, Gregor Kuntze et al.

Recent advances in deep learning algorithms have led to significant benefits for solving many medical image analysis problems. Training deep learning models commonly requires large datasets with expert-labeled annotations. However, acquiring expert-labeled annotation is not only expensive but also is subjective, error-prone, and inter-/intra- observer variability introduces noise to labels. This is particularly a problem when using deep learning models for segmenting medical images due to the ambiguous anatomical boundaries. Image-based medical diagnosis tools using deep learning models trained with incorrect segmentation labels can lead to false diagnoses and treatment suggestions. Multi-rater annotations might be better suited to train deep learning models with small training sets compared to single-rater annotations. The aim of this paper was to develop and evaluate a method to generate probabilistic labels based on multi-rater annotations and anatomical knowledge of the lesion features in MRI and a method to train segmentation models using probabilistic labels using normalized active-passive loss as a "noise-tolerant loss" function. The model was evaluated by comparing it to binary ground truth for 17 knees MRI scans for clinical segmentation and detection of bone marrow lesions (BML). The proposed method successfully improved precision 14, recall 22, and Dice score 8 percent compared to a binary cross-entropy loss function. Overall, the results of this work suggest that the proposed normalized active-passive loss using soft labels successfully mitigated the effects of noisy labels.

CVApr 13
Towards Brain MRI Foundation Models for the Clinic: Findings from the FOMO25 Challenge

Asbjørn Munk, Stefano Cerri, Vardan Nersesjan et al.

Clinical deployment of automated brain MRI analysis faces a fundamental challenge: clinical data is heterogeneous and noisy, and high-quality labels are prohibitively costly to obtain. Self-supervised learning (SSL) can address this by leveraging the vast amounts of unlabeled data produced in clinical workflows to train robust \textit{foundation models} that adapt out-of-domain with minimal supervision. However, the development of foundation models for brain MRI has been limited by small pretraining datasets and in-domain benchmarking focused on high-quality, research-grade data. To address this gap, we organized the FOMO25 challenge as a satellite event at MICCAI 2025. FOMO25 provided participants with a large pretraining dataset, FOMO60K, and evaluated models on data sourced directly from clinical workflows in few-shot and out-of-domain settings. Tasks covered infarct classification, meningioma segmentation, and brain age regression, and considered both models trained on FOMO60K (method track) and any data (open track). Nineteen foundation models from sixteen teams were evaluated using a standardized containerized pipeline. Results show that (a) self-supervised pretraining improves generalization on clinical data under domain shift, with the strongest models trained \textit{out-of-domain} surpassing supervised baselines trained \textit{in-domain}. (b) No single pretraining objective benefits all tasks: MAE favors segmentation, hybrid reconstruction-contrastive objectives favor classification, and (c) strong performance was achieved by small pretrained models, and improvements from scaling model size and training duration did not yield reliable benefits.

LGNov 28, 2025
CRAwDAD: Causal Reasoning Augmentation with Dual-Agent Debate

Finn G. Vamosi, Nils D. Forkert

When people reason about cause and effect, they often consider many competing "what if" scenarios before deciding which explanation fits best. Analogously, advanced language models capable of causal inference can consider multiple interventions and counterfactuals to judge the validity of causal claims. Crucially, this type of reasoning is less like a single calculation and more like an internal dialogue between alternative hypotheses. In this paper, we make this dialogue explicit through a dual-agent debate framework where one model provides a structured causal inference, and the other critically examines this reasoning for logical flaws. When disagreements arise, agents attempt to persuade each other, challenging each other's logic and revising their conclusions until they converge on a mutually agreed answer. To take advantage of this deliberative process, we specifically use reasoning language models, whose strengths in both causal inference and adversarial debate remain under-explored relative to standard large language models. We evaluate our approach on the CLadder dataset, a benchmark linking natural language questions to formally defined causal graphs across all three rungs of Pearl's ladder of causation. With Qwen3 and DeepSeek-R1 as debater agents, we demonstrate that multi-agent debate improves DeepSeek-R1's overall accuracy in causal inference from 78.03% to 87.45%, with the counterfactual category specifically improving from 67.94% to 80.04% accuracy. Similarly, Qwen3's overall accuracy improves from 84.16% to 89.41%, and counterfactual questions from 71.53% to 80.35%, showing that strong models can still benefit greatly from debate with weaker agents. Our results highlight the potential of reasoning models as building blocks for multi-agent systems in causal inference, and demonstrate the importance of diverse perspectives in causal problem-solving.

CVOct 11, 2025
Semi-disentangled spatiotemporal implicit neural representations of longitudinal neuroimaging data for trajectory classification

Agampreet Aulakh, Nils D. Forkert, Matthias Wilms

The human brain undergoes dynamic, potentially pathology-driven, structural changes throughout a lifespan. Longitudinal Magnetic Resonance Imaging (MRI) and other neuroimaging data are valuable for characterizing trajectories of change associated with typical and atypical aging. However, the analysis of such data is highly challenging given their discrete nature with different spatial and temporal image sampling patterns within individuals and across populations. This leads to computational problems for most traditional deep learning methods that cannot represent the underlying continuous biological process. To address these limitations, we present a new, fully data-driven method for representing aging trajectories across the entire brain by modelling subject-specific longitudinal T1-weighted MRI data as continuous functions using Implicit Neural Representations (INRs). Therefore, we introduce a novel INR architecture capable of partially disentangling spatial and temporal trajectory parameters and design an efficient framework that directly operates on the INRs' parameter space to classify brain aging trajectories. To evaluate our method in a controlled data environment, we develop a biologically grounded trajectory simulation and generate T1-weighted 3D MRI data for 450 healthy and dementia-like subjects at regularly and irregularly sampled timepoints. In the more realistic irregular sampling experiment, our INR-based method achieves 81.3% accuracy for the brain aging trajectory classification task, outperforming a standard deep learning baseline model (73.7%).

CVJul 24, 2025
Exploring the interplay of label bias with subgroup size and separability: A case study in mammographic density classification

Emma A. M. Stanley, Raghav Mehta, Mélanie Roschewitz et al.

Systematic mislabelling affecting specific subgroups (i.e., label bias) in medical imaging datasets represents an understudied issue concerning the fairness of medical AI systems. In this work, we investigated how size and separability of subgroups affected by label bias influence the learned features and performance of a deep learning model. Therefore, we trained deep learning models for binary tissue density classification using the EMory BrEast imaging Dataset (EMBED), where label bias affected separable subgroups (based on imaging manufacturer) or non-separable "pseudo-subgroups". We found that simulated subgroup label bias led to prominent shifts in the learned feature representations of the models. Importantly, these shifts within the feature space were dependent on both the relative size and the separability of the subgroup affected by label bias. We also observed notable differences in subgroup performance depending on whether a validation set with clean labels was used to define the classification threshold for the model. For instance, with label bias affecting the majority separable subgroup, the true positive rate for that subgroup fell from 0.898, when the validation set had clean labels, to 0.518, when the validation set had biased labels. Our work represents a key contribution toward understanding the consequences of label bias on subgroup fairness in medical imaging AI.

CVJun 3, 2025
Deep Learning for Retinal Degeneration Assessment: A Comprehensive Analysis of the MARIO AMD Progression Challenge

Rachid Zeghlache, Ikram Brahim, Pierre-Henri Conze et al.

The MARIO challenge, held at MICCAI 2024, focused on advancing the automated detection and monitoring of age-related macular degeneration (AMD) through the analysis of optical coherence tomography (OCT) images. Designed to evaluate algorithmic performance in detecting neovascular activity changes within AMD, the challenge incorporated unique multi-modal datasets. The primary dataset, sourced from Brest, France, was used by participating teams to train and test their models. The final ranking was determined based on performance on this dataset. An auxiliary dataset from Algeria was used post-challenge to evaluate population and device shifts from submitted solutions. Two tasks were involved in the MARIO challenge. The first one was the classification of evolution between two consecutive 2D OCT B-scans. The second one was the prediction of future AMD evolution over three months for patients undergoing anti-vascular endothelial growth factor (VEGF) therapy. Thirty-five teams participated, with the top 12 finalists presenting their methods. This paper outlines the challenge's structure, tasks, data characteristics, and winning methodologies, setting a benchmark for AMD monitoring using OCT, infrared imaging, and clinical data (such as the number of visits, age, gender, etc.). The results of this challenge indicate that artificial intelligence (AI) performs as well as a physician in measuring AMD progression (Task 1) but is not yet able of predicting future evolution (Task 2).

IVNov 2, 2021
Constructing High-Order Signed Distance Maps from Computed Tomography Data with Application to Bone Morphometry

Bryce A. Besler, Tannis D. Kemp, Nils D. Forkert et al.

An algorithm is presented for constructing high-order signed distance fields for two phase materials imaged with computed tomography. The signed distance field is high-order in that it is free of the quantization artifact associated with the distance transform of sampled signals. The narrowband is solved using a closest point algorithm extended for implicit embeddings that are not a signed distance field. The high-order fast sweeping algorithm is used to extend the narrowband to the remainder of the domain. The order of accuracy of the narrowband and extension methods are verified on ideal implicit surfaces. The method is applied to ten excised cubes of bovine trabecular bone. Localization of the surface, estimation of phase densities, and local morphometry is validated with these subjects. Since the embedding is high-order, gradients and thus curvatures can be accurately estimated locally in the image data.

CVJul 29, 2021
Local Morphometry of Closed, Implicit Surfaces

Bryce A Besler, Tannis D. Kemp, Andrew S. Michalski et al.

Anatomical structures such as the hippocampus, liver, and bones can be analyzed as orientable, closed surfaces. This permits the computation of volume, surface area, mean curvature, Gaussian curvature, and the Euler-Poincaré characteristic as well as comparison of these morphometrics between structures of different topology. The structures are commonly represented implicitly in curve evolution problems as the zero level set of an embedding. Practically, binary images of anatomical structures are embedded using a signed distance transform. However, quantization prevents the accurate computation of curvatures, leading to considerable errors in morphometry. This paper presents a fast, simple embedding procedure for accurate local morphometry as the zero crossing of the Gaussian blurred binary image. The proposed method was validated based on the femur and fourth lumbar vertebrae of 50 clinical computed tomography datasets. The results show that the signed distance transform leads to large quantization errors in the computed local curvature. Global validation of morphometry using regression and Bland-Altman analysis revealed that the coefficient of determination for the average mean curvature is improved from 93.8% with the signed distance transform to 100% with the proposed method. For the surface area, the proportional bias is improved from -5.0% for the signed distance transform to +0.6% for the proposed method. The Euler-Poincaré characteristic is improved from unusable in the signed distance transform to 98% accuracy for the proposed method. The proposed method enables an improved local and global evaluation of curvature for purposes of morphometry on closed, implicit surfaces.

CVNov 26, 2020
Bidirectional Modeling and Analysis of Brain Aging with Normalizing Flows

Matthias Wilms, Jordan J. Bannister, Pauline Mouches et al.

Brain aging is a widely studied longitudinal process throughout which the brain undergoes considerable morphological changes and various machine learning approaches have been proposed to analyze it. Within this context, brain age prediction from structural MR images and age-specific brain morphology template generation are two problems that have attracted much attention. While most approaches tackle these tasks independently, we assume that they are inverse directions of the same functional bidirectional relationship between a brain's morphology and an age variable. In this paper, we propose to model this relationship with a single conditional normalizing flow, which unifies brain age prediction and age-conditioned generative modeling in a novel way. In an initial evaluation of this idea, we show that our normalizing flow brain aging model can accurately predict brain age while also being able to generate age-specific brain morphology templates that realistically represent the typical aging trend in a healthy population. This work is a step towards unified modeling of functional relationships between 3D brain morphology and clinical variables of interest with powerful normalizing flows.

CVMar 25, 2018
DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes

Giles Tetteh, Velizar Efremov, Nils D. Forkert et al.

We present DeepVesselNet, an architecture tailored to the challenges faced when extracting vessel networks or trees and corresponding features in 3-D angiographic volumes using deep learning. We discuss the problems of low execution speed and high memory requirements associated with full 3-D convolutional networks, high-class imbalance arising from the low percentage of vessel voxels, and unavailability of accurately annotated training data - and offer solutions as the building blocks of DeepVesselNet. First, we formulate 2-D orthogonal cross-hair filters which make use of 3-D context information at a reduced computational burden. Second, we introduce a class balancing cross-entropy loss function with false positive rate correction to handle the high-class imbalance and high false positive rate problems associated with existing loss functions. Finally, we generate synthetic dataset using a computational angiogenesis model capable of generating vascular trees under physiological constraints on local network structure and topology and use these data for transfer learning. DeepVesselNet is optimized for segmenting and analyzing vessels, and we test the performance on a range of angiographic volumes including clinical MRA data of the human brain, as well as X-ray tomographic microscopy scans of the rat brain. Our experiments show that, by replacing 3-D filters with cross-hair filters in our network, we achieve over 23% improvement in speed, lower memory footprint, lower network complexity which prevents overfitting and comparable accuracy (with a Cox-Wilcoxon paired sample significance test p-value of 0.07 when compared to full 3-D filters). Our class balancing metric is crucial for training the network and transfer learning with synthetic data is an efficient, robust, and very generalizable approach leading to a network that excels in a variety of angiography segmentation tasks.