Mark Jenkinson

IV
h-index34
12papers
142citations
Novelty41%
AI Score49

12 Papers

LGMay 31, 2022Code
FedHarmony: Unlearning Scanner Bias with Distributed Data

Nicola K Dinsdale, Mark Jenkinson, Ana IL Namburete

The ability to combine data across scanners and studies is vital for neuroimaging, to increase both statistical power and the representation of biological variability. However, combining datasets across sites leads to two challenges: first, an increase in undesirable non-biological variance due to scanner and acquisition differences - the harmonisation problem - and second, data privacy concerns due to the inherently personal nature of medical imaging data, meaning that sharing them across sites may risk violation of privacy laws. To overcome these restrictions, we propose FedHarmony: a harmonisation framework operating in the federated learning paradigm. We show that to remove the scanner-specific effects, we only need to share the mean and standard deviation of the learned features, helping to protect individual subjects' privacy. We demonstrate our approach across a range of realistic data scenarios, using real multi-site data from the ABIDE dataset, thus showing the potential utility of our method for MRI harmonisation across studies. Our code is available at https://github.com/nkdinsdale/FedHarmony.

CVMar 28, 2023Code
SFHarmony: Source Free Domain Adaptation for Distributed Neuroimaging Analysis

Nicola K Dinsdale, Mark Jenkinson, Ana IL Namburete

To represent the biological variability of clinical neuroimaging populations, it is vital to be able to combine data across scanners and studies. However, different MRI scanners produce images with different characteristics, resulting in a domain shift known as the `harmonisation problem'. Additionally, neuroimaging data is inherently personal in nature, leading to data privacy concerns when sharing the data. To overcome these barriers, we propose an Unsupervised Source-Free Domain Adaptation (SFDA) method, SFHarmony. Through modelling the imaging features as a Gaussian Mixture Model and minimising an adapted Bhattacharyya distance between the source and target features, we can create a model that performs well for the target data whilst having a shared feature representation across the data domains, without needing access to the source data for adaptation or target labels. We demonstrate the performance of our method on simulated and real domain shifts, showing that the approach is applicable to classification, segmentation and regression tasks, requiring no changes to the algorithm. Our method outperforms existing SFDA approaches across a range of realistic data scenarios, demonstrating the potential utility of our approach for MRI harmonisation and general SFDA problems. Our code is available at \url{https://github.com/nkdinsdale/SFHarmony}.

IVMar 9, 2023Code
Segmentation method for cerebral blood vessels from MRA using hysteresis

Georgia Kenyon, Stephan Lau, Michael A. Chappell et al.

Segmentation of cerebral blood vessels from Magnetic Resonance Imaging (MRI) is an open problem that could be solved with deep learning (DL). However, annotated data for training is often scarce. Due to the absence of open-source tools, we aim to develop a classical segmentation method that generates vessel ground truth from Magnetic Resonance Angiography for DL training of segmentation across a variety of modalities. The method combines size-specific Hessian filters, hysteresis thresholding and connected component correction. The optimal choice of processing steps was evaluated with a blinded scoring by a clinician using 24 3D images. The results show that all method steps are necessary to produce the highest (14.2/15) vessel segmentation quality score. Omitting the connected component correction caused the largest quality loss. The method, which is available on GitHub, can be used to train DL models for vessel segmentation.

LGMar 1, 2022
Uncertainty categories in medical image segmentation: a study of source-related diversity

Luke Whitbread, Mark Jenkinson

Measuring uncertainties in the output of a deep learning method is useful in several ways, such as in assisting with interpretation of the outputs, helping build confidence with end users, and for improving the training and performance of the networks. Several different methods have been proposed to estimate uncertainties, including those from epistemic (relating to the model used) and aleatoric (relating to the data) sources using test-time dropout and augmentation, respectively. Not only are these uncertainty sources different, but they are governed by parameter settings (e.g., dropout rate or type and level of augmentation) that establish even more distinct uncertainty categories. This work investigates how different the uncertainties are from these categories, for magnitude and spatial pattern, to empirically address the question of whether they provide usefully distinct information that should be captured whenever uncertainties are used. We take the well characterised BraTS challenge dataset to demonstrate that there are substantial differences in both magnitude and spatial pattern of uncertainties from the different categories, and discuss the implications of these in various use cases.

LGMay 19
Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

Nikhil Cherian Kurian, Victor Caquilpan Parra, Abin Shoby et al.

Diagnostic performance in medical AI varies systematically across demographic groups, yet subgroup AUC can mask clinically important disparities. At a fixed inference-time operating point, some groups may exhibit over-diagnostic behaviour, characterized by elevated true and false positive rates, while others show under-diagnostic patterns with reduced true and false positive rates. These opposing tendencies can cancel in aggregate AUCs while producing meaningful inequities in clinical decision-making. Motivated by the need to assess and mitigate such disparities at the operating point and across multiple demographic attributes simultaneously, we propose a worst-group equalized-odds margin regularizer. The proposed regularizer explicitly targets subgroup-level deviations on both the true positive and false positive sides at inference. At each update, the method identifies subgroups defined by explicit demographic attributes (e.g., age, sex, and race) that exhibit the most extreme margin deviations and applies a unified penalty, enabling fairness optimization across multiple demographic axes without requiring explicit intersectional constraints. Across two medical imaging datasets in realistic multi-label settings, our method consistently reduces disparities in Equalized Odds and Equalized Opportunity with minimal impact on AUC, preserving diagnostic performance while improving fairness.

IVJul 28, 2021Code
TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee Topology Preservation in Segmentations

Madeleine K. Wyburd, Nicola K. Dinsdale, Ana I. L. Namburete et al.

Accurate topology is key when performing meaningful anatomical segmentations, however, it is often overlooked in traditional deep learning methods. In this work we propose TEDS-Net: a novel segmentation method that guarantees accurate topology. Our method is built upon a continuous diffeomorphic framework, which enforces topology preservation. However, in practice, diffeomorphic fields are represented using a finite number of parameters and sampled using methods such as linear interpolation, violating the theoretical guarantees. We therefore introduce additional modifications to more strictly enforce it. Our network learns how to warp a binary prior, with the desired topological characteristics, to complete the segmentation task. We tested our method on myocardium segmentation from an open-source 2D heart dataset. TEDS-Net preserved topology in 100% of the cases, compared to 90% from the U-Net, without sacrificing on Hausdorff Distance or Dice performance. Code will be made available at: www.github.com/mwyburd/TEDS-Net

CVNov 23, 2025
From Healthy Scans to Annotated Tumors: A Tumor Fabrication Framework for 3D Brain MRI Synthesis

Nayu Dong, Townim Chowdhury, Hieu Phan et al.

The scarcity of annotated Magnetic Resonance Imaging (MRI) tumor data presents a major obstacle to accurate and automated tumor segmentation. While existing data synthesis methods offer promising solutions, they often suffer from key limitations: manual modeling is labor intensive and requires expert knowledge. Deep generative models may be used to augment data and annotation, but they typically demand large amounts of training pairs in the first place, which is impractical in data limited clinical settings. In this work, we propose Tumor Fabrication (TF), a novel two-stage framework for unpaired 3D brain tumor synthesis. The framework comprises a coarse tumor synthesis process followed by a refinement process powered by a generative model. TF is fully automated and leverages only healthy image scans along with a limited amount of real annotated data to synthesize large volumes of paired synthetic data for enriching downstream supervised segmentation training. We demonstrate that our synthetic image-label pairs used as data enrichment can significantly improve performance on downstream tumor segmentation tasks in low-data regimes, offering a scalable and reliable solution for medical image enrichment and addressing critical challenges in data scarcity for clinical AI applications.

IVSep 8, 2025
Validation of a CT-brain analysis tool for measuring global cortical atrophy in older patient cohorts

Sukhdeep Bal, Emma Colbourne, Jasmine Gan et al.

Quantification of brain atrophy currently requires visual rating scales which are time consuming and automated brain image analysis is warranted. We validated our automated deep learning (DL) tool measuring the Global Cerebral Atrophy (GCA) score against trained human raters, and associations with age and cognitive impairment, in representative older (>65 years) patients. CT-brain scans were obtained from patients in acute medicine (ORCHARD-EPR), acute stroke (OCS studies) and a legacy sample. Scans were divided in a 60/20/20 ratio for training, optimisation and testing. CT-images were assessed by two trained raters (rater-1=864 scans, rater-2=20 scans). Agreement between DL tool-predicted GCA scores (range 0-39) and the visual ratings was evaluated using mean absolute error (MAE) and Cohen's weighted kappa. Among 864 scans (ORCHARD-EPR=578, OCS=200, legacy scans=86), MAE between the DL tool and rater-1 GCA scores was 3.2 overall, 3.1 for ORCHARD-EPR, 3.3 for OCS and 2.6 for the legacy scans and half had DL-predicted GCA error between -2 and 2. Inter-rater agreement was Kappa=0.45 between the DL-tool and rater-1, and 0.41 between the tool and rater- 2 whereas it was lower at 0.28 for rater-1 and rater-2. There was no difference in GCA scores from the DL-tool and the two raters (one-way ANOVA, p=0.35) or in mean GCA scores between the DL-tool and rater-1 (paired t-test, t=-0.43, p=0.66), the tool and rater-2 (t=1.35, p=0.18) or between rater-1 and rater-2 (t=0.99, p=0.32). DL-tool GCA scores correlated with age and cognitive scores (both p<0.001). Our DL CT-brain analysis tool measured GCA score accurately and without user input in real-world scans acquired from older patients. Our tool will enable extraction of standardised quantitative measures of atrophy at scale for use in health data research and will act as proof-of-concept towards a point-of-care clinically approved tool.

IVJan 25, 2022
Mutual information neural estimation for unsupervised multi-modal registration of brain images

Gerard Snaauw, Michele Sasdelli, Gabriel Maicas et al.

Many applications in image-guided surgery and therapy require fast and reliable non-linear, multi-modal image registration. Recently proposed unsupervised deep learning-based registration methods have demonstrated superior performance compared to iterative methods in just a fraction of the time. Most of the learning-based methods have focused on mono-modal image registration. The extension to multi-modal registration depends on the use of an appropriate similarity function, such as the mutual information (MI). We propose guiding the training of a deep learning-based registration method with MI estimation between an image-pair in an end-to-end trainable network. Our results show that a small, 2-layer network produces competitive results in both mono- and multi-modal registration, with sub-second run-times. Comparisons to both iterative and deep learning-based methods show that our MI-based method produces topologically and qualitatively superior results with an extremely low rate of non-diffeomorphic transformations. Real-time clinical application will benefit from a better visual matching of anatomical structures and less registration failures/outliers.

IVJul 7, 2021
Challenges for machine learning in clinical translation of big data imaging studies

Nicola K Dinsdale, Emma Bluemke, Vaanathi Sundaresan et al.

The combination of deep learning image analysis methods and large-scale imaging datasets offers many opportunities to imaging neuroscience and epidemiology. However, despite the success of deep learning when applied to many neuroimaging tasks, there remain barriers to the clinical translation of large-scale datasets and processing tools. Here, we explore the main challenges and the approaches that have been explored to overcome them. We focus on issues relating to data availability, interpretability, evaluation and logistical challenges, and discuss the challenges we believe are still to be overcome to enable the full success of big data deep learning approaches to be experienced outside of the research field.

IVJun 2, 2021
Self-supervised Lesion Change Detection and Localisation in Longitudinal Multiple Sclerosis Brain Imaging

Minh-Son To, Ian G Sarno, Chee Chong et al.

Longitudinal imaging forms an essential component in the management and follow-up of many medical conditions. The presence of lesion changes on serial imaging can have significant impact on clinical decision making, highlighting the important role for automated change detection. Lesion changes can represent anomalies in serial imaging, which implies a limited availability of annotations and a wide variety of possible changes that need to be considered. Hence, we introduce a new unsupervised anomaly detection and localisation method trained exclusively with serial images that do not contain any lesion changes. Our training automatically synthesises lesion changes in serial images, introducing detection and localisation pseudo-labels that are used to self-supervise the training of our model. Given the rarity of these lesion changes in the synthesised images, we train the model with the imbalance robust focal Tversky loss. When compared to supervised models trained on different datasets, our method shows competitive performance in the detection and localisation of new demyelinating lesions on longitudinal magnetic resonance imaging in multiple sclerosis patients. Code for the models will be made available on GitHub.

IVMay 24, 2021
Brain tumour segmentation using a triplanar ensemble of U-Nets

Vaanathi Sundaresan, Ludovica Griffanti, Mark Jenkinson

Gliomas appear with wide variation in their characteristics both in terms of their appearance and location on brain MR images, which makes robust tumour segmentation highly challenging, and leads to high inter-rater variability even in manual segmentations. In this work, we propose a triplanar ensemble network, with an independent tumour core prediction module, for accurate segmentation of these tumours and their sub-regions. On evaluating our method on the MICCAI Brain Tumor Segmentation (BraTS) challenge validation dataset, for tumour sub-regions, we achieved a Dice similarity coefficient of 0.77 for both enhancing tumour (ET) and tumour core (TC). In the case of the whole tumour (WT) region, we achieved a Dice value of 0.89, which is on par with the top-ranking methods from BraTS'17-19. Our method achieved an evaluation score that was the equal 5th highest value (with our method ranking in 10th place) in the BraTS'20 challenge, with mean Dice values of 0.81, 0.89 and 0.84 on ET, WT and TC regions respectively on the BraTS'20 unseen test dataset.