CVDec 16, 2022Code
Context Label Learning: Improving Background Class Representations in Semantic SegmentationZeju Li, Konstantinos Kamnitsas, Cheng Ouyang et al.
Background samples provide key contextual information for segmenting regions of interest (ROIs). However, they always cover a diverse set of structures, causing difficulties for the segmentation model to learn good decision boundaries with high sensitivity and precision. The issue concerns the highly heterogeneous nature of the background class, resulting in multi-modal distributions. Empirically, we find that neural networks trained with heterogeneous background struggle to map the corresponding contextual samples to compact clusters in feature space. As a result, the distribution over background logit activations may shift across the decision boundary, leading to systematic over-segmentation across different datasets and tasks. In this study, we propose context label learning (CoLab) to improve the context representations by decomposing the background class into several subclasses. Specifically, we train an auxiliary network as a task generator, along with the primary segmentation model, to automatically generate context labels that positively affect the ROI segmentation accuracy. Extensive experiments are conducted on several challenging segmentation tasks and datasets. The results demonstrate that CoLab can guide the segmentation model to map the logits of background samples away from the decision boundary, resulting in significantly improved segmentation accuracy. Code is available.
LGApr 22, 2022
Federated Learning Enables Big Data for Rare Cancer Boundary DetectionSarthak Pati, Ujjwal Baid, Brandon Edwards et al.
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
CVAug 31, 2023Code
Post-Deployment Adaptation with Access to Source Data via Federated Learning and Source-Target Remote Gradient AlignmentFelix Wagner, Zeju Li, Pramit Saha et al.
Deployment of Deep Neural Networks in medical imaging is hindered by distribution shift between training data and data processed after deployment, causing performance degradation. Post-Deployment Adaptation (PDA) addresses this by tailoring a pre-trained, deployed model to the target data distribution using limited labelled or entirely unlabelled target data, while assuming no access to source training data as they cannot be deployed with the model due to privacy concerns and their large size. This makes reliable adaptation challenging due to limited learning signal. This paper challenges this assumption and introduces FedPDA, a novel adaptation framework that brings the utility of learning from remote data from Federated Learning into PDA. FedPDA enables a deployed model to obtain information from source data via remote gradient exchange, while aiming to optimize the model specifically for the target domain. Tailored for FedPDA, we introduce a novel optimization method StarAlign (Source-Target Remote Gradient Alignment) that aligns gradients between source-target domain pairs by maximizing their inner product, to facilitate learning a target-specific model. We demonstrate the method's effectiveness using multi-center databases for the tasks of cancer metastases detection and skin lesion classification, where our method compares favourably to previous work. Code is available at: https://github.com/FelixWag/StarAlign
CVJul 20, 2022
Estimating Model Performance under Domain Shifts with Class-Specific Confidence ScoresZeju Li, Konstantinos Kamnitsas, Mobarakol Islam et al.
Machine learning models are typically deployed in a test setting that differs from the training setting, potentially leading to decreased model performance because of domain shift. If we could estimate the performance that a pre-trained model would achieve on data from a specific deployment setting, for example a certain clinic, we could judge whether the model could safely be deployed or if its performance degrades unacceptably on the specific data. Existing approaches estimate this based on the confidence of predictions made on unlabeled test data from the deployment's domain. We find existing methods struggle with data that present class imbalance, because the methods used to calibrate confidence do not account for bias induced by class imbalance, consequently failing to estimate class-wise accuracy. Here, we introduce class-wise calibration within the framework of performance estimation for imbalanced datasets. Specifically, we derive class-specific modifications of state-of-the-art confidence-based model evaluation methods including temperature scaling (TS), difference of confidences (DoC), and average thresholded confidence (ATC). We also extend the methods to estimate Dice similarity coefficient (DSC) in image segmentation. We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets. Our methods improve accuracy estimation by 18\% in classification under natural domain shifts, and double the estimation accuracy on segmentation tasks, when compared with prior methods.
IVAug 30, 2023
Modality Cycles with Masked Conditional Diffusion for Unsupervised Anomaly Segmentation in MRIZiyun Liang, Harry Anthony, Felix Wagner et al.
Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like medical imaging. This paper introduces Masked Modality Cycles with Conditional Diffusion (MMCCD), a method that enables segmentation of anomalies across diverse patterns in multimodal MRI. The method is based on two fundamental ideas. First, we propose the use of cyclic modality translation as a mechanism for enabling abnormality detection. Image-translation models learn tissue-specific modality mappings, which are characteristic of tissue physiology. Thus, these learned mappings fail to translate tissues or image patterns that have never been encountered during training, and the error enables their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to `imagine' what tissue exists under a masked area, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices of BraTS2021 multi-modality MRIs and testing on slices with tumors. We show that our method compares favorably to previous unsupervised approaches based on image reconstruction and denoising with autoencoders and diffusion models.
CVJun 27, 2022
Distributional Gaussian Processes Layers for Out-of-Distribution DetectionSebastian G. Popescu, David J. Sharp, James H. Cole et al.
Machine learning models deployed on medical imaging tasks must be equipped with out-of-distribution detection capabilities in order to avoid erroneous predictions. It is unsure whether out-of-distribution detection models reliant on deep neural networks are suitable for detecting domain shifts in medical imaging. Gaussian Processes can reliably separate in-distribution data points from out-of-distribution data points via their mathematical construction. Hence, we propose a parameter efficient Bayesian layer for hierarchical convolutional Gaussian Processes that incorporates Gaussian Processes operating in Wasserstein-2 space to reliably propagate uncertainty. This directly replaces convolving Gaussian Processes with a distance-preserving affine operator on distributions. Our experiments on brain tissue-segmentation show that the resulting architecture approaches the performance of well-established deterministic segmentation algorithms (U-Net), which has not been achieved with previous hierarchical Gaussian Processes. Moreover, by applying the same segmentation model to out-of-distribution data (i.e., images with pathology such as brain tumors), we show that our uncertainty estimates result in out-of-distribution detection that outperforms the capabilities of previous Bayesian networks and reconstruction-based approaches that learn normative distributions. To facilitate future work our code is publicly available.
CVAug 30, 2024Code
Evaluating Reliability in Medical DNNs: A Critical Analysis of Feature and Confidence-Based OOD DetectionHarry Anthony, Konstantinos Kamnitsas
Reliable use of deep neural networks (DNNs) for medical image analysis requires methods to identify inputs that differ significantly from the training data, called out-of-distribution (OOD), to prevent erroneous predictions. OOD detection methods can be categorised as either confidence-based (using the model's output layer for OOD detection) or feature-based (not using the output layer). We created two new OOD benchmarks by dividing the D7P (dermatology) and BreastMNIST (ultrasound) datasets into subsets which either contain or don't contain an artefact (rulers or annotations respectively). Models were trained with artefact-free images, and images with the artefacts were used as OOD test sets. For each OOD image, we created a counterfactual by manually removing the artefact via image processing, to assess the artefact's impact on the model's predictions. We show that OOD artefacts can boost a model's softmax confidence in its predictions, due to correlations in training data among other factors. This contradicts the common assumption that OOD artefacts should lead to more uncertain outputs, an assumption on which most confidence-based methods rely. We use this to explain why feature-based methods (e.g. Mahalanobis score) typically have greater OOD detection performance than confidence-based methods (e.g. MCP). However, we also show that feature-based methods typically perform worse at distinguishing between inputs that lead to correct and incorrect predictions (for both OOD and ID data). Following from these insights, we argue that a combination of feature-based and confidence-based methods should be used within DNN pipelines to mitigate their respective weaknesses. These project's code and OOD benchmarks are available at: https://github.com/HarryAnthony/Evaluating_OOD_detection.
CVJul 3, 2024Code
An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image SynthesisMarawan Elbatel, Konstantinos Kamnitsas, Xiaomeng Li
Generative modeling seeks to approximate the statistical properties of real data, enabling synthesis of new data that closely resembles the original distribution. Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs) represent significant advancements in generative modeling, drawing inspiration from game theory and thermodynamics, respectively. Nevertheless, the exploration of generative modeling through the lens of biological evolution remains largely untapped. In this paper, we introduce a novel family of models termed Generative Cellular Automata (GeCA), inspired by the evolution of an organism from a single cell. GeCAs are evaluated as an effective augmentation tool for retinal disease classification across two imaging modalities: Fundus and Optical Coherence Tomography (OCT). In the context of OCT imaging, where data is scarce and the distribution of classes is inherently skewed, GeCA significantly boosts the performance of 11 different ophthalmological conditions, achieving a 12% increase in the average F1 score compared to conventional baselines. GeCAs outperform both diffusion methods that incorporate UNet or state-of-the art variants with transformer-based denoising models, under similar parameter constraints. Code is available at: https://github.com/xmed-lab/GeCA.
CVFeb 23Code
The Invisible Gorilla Effect in Out-of-distribution DetectionHarry Anthony, Ziyun Liang, Hermione Warr et al.
Deep Neural Networks achieve high performance in vision tasks by learning features from regions of interest (ROI) within images, but their performance degrades when deployed on out-of-distribution (OOD) data that differs from training data. This challenge has led to OOD detection methods that aim to identify and reject unreliable predictions. Although prior work shows that OOD detection performance varies by artefact type, the underlying causes remain underexplored. To this end, we identify a previously unreported bias in OOD detection: for hard-to-detect artefacts (near-OOD), detection performance typically improves when the artefact shares visual similarity (e.g. colour) with the model's ROI and drops when it does not - a phenomenon we term the Invisible Gorilla Effect. For example, in a skin lesion classifier with red lesion ROI, we show the method Mahalanobis Score achieves a 31.5% higher AUROC when detecting OOD red ink (similar to ROI) compared to black ink (dissimilar) annotations. We annotated artefacts by colour in 11,355 images from three public datasets (e.g. ISIC) and generated colour-swapped counterfactuals to rule out dataset bias. We then evaluated 40 OOD methods across 7 benchmarks and found significant performance drops for most methods when artefacts differed from the ROI. Our findings highlight an overlooked failure mode in OOD detection and provide guidance for more robust detectors. Code and annotations are available at: https://github.com/HarryAnthony/Invisible_Gorilla_Effect.
AIJul 31, 2024
Quality Control for Radiology Report Generation Models via Auxiliary Auditing ComponentsHermione Warr, Yasin Ibrahim, Daniel R. McGowan et al.
Automation of medical image interpretation could alleviate bottlenecks in diagnostic workflows, and has become of particular interest in recent years due to advancements in natural language processing. Great strides have been made towards automated radiology report generation via AI, yet ensuring clinical accuracy in generated reports is a significant challenge, hindering deployment of such methods in clinical practice. In this work we propose a quality control framework for assessing the reliability of AI-generated radiology reports with respect to semantics of diagnostic importance using modular auxiliary auditing components (AC). Evaluating our pipeline on the MIMIC-CXR dataset, our findings show that incorporating ACs in the form of disease-classifiers can enable auditing that identifies more reliable reports, resulting in higher F1 scores compared to unfiltered generated reports. Additionally, leveraging the confidence of the AC labels further improves the audit's effectiveness.
CVSep 11, 2025Code
Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During TrainingAnthony P. Addison, Felix Wagner, Wentian Xu et al.
Segmentation models are important tools for the detection and analysis of lesions in brain MRI. Depending on the type of brain pathology that is imaged, MRI scanners can acquire multiple, different image modalities (contrasts). Most segmentation models for multimodal brain MRI are restricted to fixed modalities and cannot effectively process new ones at inference. Some models generalize to unseen modalities but may lose discriminative modality-specific information. This work aims to develop a model that can perform inference on data that contain image modalities unseen during training, previously seen modalities, and heterogeneous combinations of both, thus allowing a user to utilize any available imaging modalities. We demonstrate this is possible with a simple, thus practical alteration to the U-net architecture, by integrating a modality-agnostic input channel or pathway, alongside modality-specific input channels. To train this modality-agnostic component, we develop an image augmentation scheme that synthesizes artificial MRI modalities. Augmentations differentially alter the appearance of pathological and healthy brain tissue to create artificial contrasts between them while maintaining realistic anatomical integrity. We evaluate the method using 8 MRI databases that include 5 types of pathologies (stroke, tumours, traumatic brain injury, multiple sclerosis and white matter hyperintensities) and 8 modalities (T1, T1+contrast, T2, PD, SWI, DWI, ADC and FLAIR). The results demonstrate that the approach preserves the ability to effectively process MRI modalities encountered during training, while being able to process new, unseen modalities to improve its segmentation. Project code: https://github.com/Anthony-P-Addison/AGN-MOD-SEG
CVJun 10, 2025Code
DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical ImagingFelix Wagner, Pramit Saha, Harry Anthony et al.
Safe deployment of machine learning (ML) models in safety-critical domains such as medical imaging requires detecting inputs with characteristics not seen during training, known as out-of-distribution (OOD) detection, to prevent unreliable predictions. Effective OOD detection after deployment could benefit from access to the training data, enabling direct comparison between test samples and the training data distribution to identify differences. State-of-the-art OOD detection methods, however, either discard the training data after deployment or assume that test samples and training data are centrally stored together, an assumption that rarely holds in real-world settings. This is because shipping the training data with the deployed model is usually impossible due to the size of training databases, as well as proprietary or privacy constraints. We introduce the Isolation Network, an OOD detection framework that quantifies the difficulty of separating a target test sample from the training data by solving a binary classification task. We then propose Decentralized Isolation Networks (DIsoN), which enables the comparison of training and test data when data-sharing is impossible, by exchanging only model parameters between the remote computational nodes of training and deployment. We further extend DIsoN with class-conditioning, comparing a target sample solely with training data of its predicted class. We evaluate DIsoN on four medical imaging datasets (dermatology, chest X-ray, breast ultrasound, histopathology) across 12 OOD detection tasks. DIsoN performs favorably against existing methods while respecting data-privacy. This decentralized OOD detection framework opens the way for a new type of service that ML developers could provide along with their models: providing remote, secure utilization of their training data for OOD detection services. Code: https://github.com/FelixWag/DIsoN
CVApr 7, 2025Code
IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MRZiyun Liang, Xiaoqing Guo, Wentian Xu et al.
Unsupervised anomaly detection and segmentation methods train a model to learn the training distribution as `normal'. In the testing phase, they identify patterns that deviate from this normal distribution as `anomalies'. To learn the `normal' distribution, prevailing methods corrupt the images and train a model to reconstruct them. During testing, the model attempts to reconstruct corrupted inputs based on the learned `normal' distribution. Deviations from this distribution lead to high reconstruction errors, which indicate potential anomalies. However, corrupting an input image inevitably causes information loss even in normal regions, leading to suboptimal reconstruction and an increased risk of false positives. To alleviate this, we propose $\rm{IterMask3D}$, an iterative spatial mask-refining strategy designed for 3D brain MRI. We iteratively spatially mask areas of the image as corruption and reconstruct them, then shrink the mask based on reconstruction error. This process iteratively unmasks `normal' areas to the model, whose information further guides reconstruction of `normal' patterns under the mask to be reconstructed accurately, reducing false positives. In addition, to achieve better reconstruction performance, we also propose using high-frequency image content as additional structural information to guide the reconstruction of the masked area. Extensive experiments on the detection of both synthetic and real-world imaging artifacts, as well as segmentation of various pathological lesions across multiple MRI sequences, consistently demonstrate the effectiveness of our proposed method. Code is available at https://github.com/ZiyunLiang/IterMask3D.
IVJun 17, 2024Code
Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI ModalitiesFelix Wagner, Wentian Xu, Pramit Saha et al.
Segmentation models for brain lesions in MRI are typically developed for a specific disease and trained on data with a predefined set of MRI modalities. Such models cannot segment the disease using data with a different set of MRI modalities, nor can they segment other types of diseases. Moreover, this training paradigm prevents a model from using the advantages of learning from heterogeneous databases that may contain scans and segmentation labels for different brain pathologies and diverse sets of MRI modalities. Additionally, the confidentiality of patient data often prevents central data aggregation, necessitating a decentralized approach. Is it feasible to use Federated Learning (FL) to train a single model on client databases that contain scans and labels of different brain pathologies and diverse sets of MRI modalities? We demonstrate promising results by combining appropriate, simple, and practical modifications to the model and training strategy: Designing a model with input channels that cover the whole set of modalities available across clients, training with random modality drop, and exploring the effects of feature normalization methods. Evaluation on 7 brain MRI databases with 5 different diseases shows that this FL framework can train a single model achieving very promising results in segmenting all disease types seen during training. Importantly, it can segment these diseases in new databases that contain sets of modalities different from those in training clients. These results demonstrate, for the first time, the feasibility and effectiveness of using FL to train a single 3D segmentation model on decentralised data with diverse brain diseases and MRI modalities, a necessary step towards leveraging heterogeneous real-world databases. Code: https://github.com/FelixWag/FedUniBrain
IVJun 4, 2024Code
IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRIZiyun Liang, Xiaoqing Guo, J. Alison Noble et al.
Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMask2
CVMar 19, 2024Code
As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?Anjun Hu, Jindong Gu, Francesco Pinto et al.
Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems. While pre-training offers clear advantages for downstream learning, it also endows downstream models with shared adversarial vulnerabilities that can be easily identified through the open-sourced foundation model. In this work, we expose such vulnerabilities in CLIP's downstream models and show that foundation models can serve as a basis for attacking their downstream systems. In particular, we propose a simple yet effective adversarial attack strategy termed Patch Representation Misalignment (PRM). Solely based on open-sourced CLIP vision encoders, this method produces adversaries that simultaneously fool more than 20 downstream models spanning 4 common vision-language tasks (semantic segmentation, object detection, image captioning and visual question-answering). Our findings highlight the concerning safety risks introduced by the extensive usage of public foundational models in the development of downstream systems, calling for extra caution in these scenarios.
CVSep 4, 2023Code
On the use of Mahalanobis distance for out-of-distribution detection with neural networks for medical imagingHarry Anthony, Konstantinos Kamnitsas
Implementing neural networks for clinical use in medical applications necessitates the ability for the network to detect when input data differs significantly from the training data, with the aim of preventing unreliable predictions. The community has developed several methods for out-of-distribution (OOD) detection, within which distance-based approaches - such as Mahalanobis distance - have shown potential. This paper challenges the prevailing community understanding that there is an optimal layer, or combination of layers, of a neural network for applying Mahalanobis distance for detection of any OOD pattern. Using synthetic artefacts to emulate OOD patterns, this paper shows the optimum layer to apply Mahalanobis distance changes with the type of OOD pattern, showing there is no one-fits-all solution. This paper also shows that separating this OOD detector into multiple detectors at different depths of the network can enhance the robustness for detecting different OOD patterns. These insights were validated on real-world OOD tasks, training models on CheXpert chest X-rays with no support devices, then using scans with unseen pacemakers (we manually labelled 50% of CheXpert for this research) and unseen sex as OOD cases. The results inform best-practices for the use of Mahalanobis distance for OOD detection. The manually annotated pacemaker labels and the project's code are available at: https://github.com/HarryAnthony/Mahalanobis-OOD-detection.
LGFeb 7, 2024
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease DetectionPramit Saha, Divyanshu Mishra, Felix Wagner et al.
Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with data heterogeneity across participating clients. We particularly inspect whether incongruent MMFL with unimodal and multimodal clients is more beneficial than unimodal FL. Furthermore, we examine three potential routes of addressing this issue. Firstly, we study the effectiveness of various self-attention mechanisms towards incongruity-agnostic information fusion in MMFL. Secondly, we introduce a modality imputation network (MIN) pre-trained in a multimodal client for modality translation in unimodal clients and investigate its potential towards mitigating the missing modality problem. Thirdly, we assess the capability of client-level and server-level regularization techniques towards mitigating modality incongruity effects. Experiments are conducted under several MMFL settings on two publicly available real-world datasets, MIMIC-CXR and Open-I, with Chest X-Ray and radiology reports.
CVDec 19, 2024
FedPIA -- Permuting and Integrating Adapters leveraging Wasserstein Barycenters for Finetuning Foundation Models in Multi-Modal Federated LearningPramit Saha, Divyanshu Mishra, Felix Wagner et al.
Large Vision-Language Models typically require large text and image datasets for effective fine-tuning. However, collecting data from various sites, especially in healthcare, is challenging due to strict privacy regulations. An alternative is to fine-tune these models on end-user devices, such as in medical clinics, without sending data to a server. These local clients typically have limited computing power and small datasets, which are not enough for fully fine-tuning large VLMs on their own. A naive solution to these scenarios is to leverage parameter-efficient fine-tuning (PEFT) strategies and apply federated learning (FL) algorithms to combine the learned adapter weights, thereby respecting the resource limitations and data privacy. However, this approach does not fully leverage the knowledge from multiple adapters trained on diverse data distributions and for diverse tasks. The adapters are adversely impacted by data heterogeneity and task heterogeneity across clients resulting in suboptimal convergence. To this end, we propose a novel framework called FedPIA that improves upon the naive combinations of FL and PEFT by introducing Permutation and Integration of the local Adapters in the server and global Adapters in the clients exploiting Wasserstein barycenters for improved blending of client-specific and client-agnostic knowledge. This layerwise permutation helps to bridge the gap in the parameter space of local and global adapters before integration. We conduct over 2000 client-level experiments utilizing 48 medical image datasets across five different medical vision-language FL task settings encompassing visual question answering as well as image and report-based multi-label disease detection. Our experiments involving diverse client settings, ten different modalities, and two VLM backbones demonstrate that FedPIA consistently outperforms the state-of-the-art PEFT-FL baselines.
LGMar 27, 2024
Semi-Supervised Learning for Deep Causal Generative ModelsYasin Ibrahim, Hermione Warr, Konstantinos Kamnitsas
Developing models that are capable of answering questions of the form "How would x change if y had been z?'" is fundamental to advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that the corresponding labels are available in the training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.
IVNov 23, 2024
SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image SegmentationJiayuan Zhu, Junde Wu, Cheng Ouyang et al.
Medical image segmentation data inherently contain uncertainty. This can stem from both imperfect image quality and variability in labeling preferences on ambiguous pixels, which depend on annotator expertise and the clinical context of the annotations. For instance, a boundary pixel might be labeled as tumor in diagnosis to avoid under-estimation of severity, but as normal tissue in radiotherapy to prevent damage to sensitive structures. As segmentation preferences vary across downstream applications, it is often desirable for an image segmentation model to offer user-adaptable predictions rather than a fixed output. While prior uncertainty-aware and interactive methods offer adaptability, they are inefficient at test time: uncertainty-aware models require users to choose from numerous similar outputs, while interactive models demand significant user input through click or box prompts to refine segmentation. To address these challenges, we propose \textbf{SPA}, a new \textbf{S}egmentation \textbf{P}reference \textbf{A}lignment framework that efficiently adapts to diverse test-time preferences with minimal human interaction. By presenting users with a select few, distinct segmentation candidates that best capture uncertainties, it reduces the user workload to reach the preferred segmentation. To accommodate user preference, we introduce a probabilistic mechanism that leverages user feedback to adapt a model's segmentation preference. The proposed framework is evaluated on several medical image segmentation tasks: color fundus images, lung lesion and kidney CT scans, MRI scans of brain and prostate. SPA shows 1) a significant reduction in user time and effort compared to existing interactive segmentation approaches, 2) strong adaptability based on human feedback, and 3) state-of-the-art image segmentation performance across different imaging modalities and semantic labels.
CVNov 17, 2024
F$^3$OCUS -- Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-HeuristicsPramit Saha, Felix Wagner, Divyanshu Mishra et al.
Effective training of large Vision-Language Models (VLMs) on resource-constrained client devices in Federated Learning (FL) requires the usage of parameter-efficient fine-tuning (PEFT) strategies. To this end, we demonstrate the impact of two factors \textit{viz.}, client-specific layer importance score that selects the most important VLM layers for fine-tuning and inter-client layer diversity score that encourages diverse layer selection across clients for optimal VLM layer selection. We first theoretically motivate and leverage the principal eigenvalue magnitude of layerwise Neural Tangent Kernels and show its effectiveness as client-specific layer importance score. Next, we propose a novel layer updating strategy dubbed F$^3$OCUS that jointly optimizes the layer importance and diversity factors by employing a data-free, multi-objective, meta-heuristic optimization on the server. We explore 5 different meta-heuristic algorithms and compare their effectiveness for selecting model layers and adapter layers towards PEFT-FL. Furthermore, we release a new MedVQA-FL dataset involving overall 707,962 VQA triplets and 9 modality-specific clients and utilize it to train and evaluate our method. Overall, we conduct more than 10,000 client-level experiments on 6 Vision-Language FL task settings involving 58 medical image datasets and 4 different VLM architectures of varying sizes to demonstrate the effectiveness of the proposed method.
CVOct 15, 2025
Unsupervised Domain Adaptation via Content Alignment for Hippocampus SegmentationHoda Kalabizadeh, Ludovica Griffanti, Pak-Hei Yeung et al.
Deep learning models for medical image segmentation often struggle when deployed across different datasets due to domain shifts - variations in both image appearance, known as style, and population-dependent anatomical characteristics, referred to as content. This paper presents a novel unsupervised domain adaptation framework that directly addresses domain shifts encountered in cross-domain hippocampus segmentation from MRI, with specific emphasis on content variations. Our approach combines efficient style harmonisation through z-normalisation with a bidirectional deformable image registration (DIR) strategy. The DIR network is jointly trained with segmentation and discriminator networks to guide the registration with respect to a region of interest and generate anatomically plausible transformations that align source images to the target domain. We validate our approach through comprehensive evaluations on both a synthetic dataset using Morpho-MNIST (for controlled validation of core principles) and three MRI hippocampus datasets representing populations with varying degrees of atrophy. Across all experiments, our method outperforms existing baselines. For hippocampus segmentation, when transferring from young, healthy populations to clinical dementia patients, our framework achieves up to 15% relative improvement in Dice score compared to standard augmentation methods, with the largest gains observed in scenarios with substantial content shift. These results highlight the efficacy of our approach for accurate hippocampus segmentation across diverse populations.
CLAug 13, 2025
Specialised or Generic? Tokenization Choices for Radiology Language ModelsHermione Warr, Wentian Xu, Harry Anthony et al.
The vocabulary used by language models (LM) - defined by the tokenizer - plays a key role in text generation quality. However, its impact remains under-explored in radiology. In this work, we address this gap by systematically comparing general, medical, and domain-specific tokenizers on the task of radiology report summarisation across three imaging modalities. We also investigate scenarios with and without LM pre-training on PubMed abstracts. Our findings demonstrate that medical and domain-specific vocabularies outperformed widely used natural language alternatives when models are trained from scratch. Pre-training partially mitigates performance differences between tokenizers, whilst the domain-specific tokenizers achieve the most favourable results. Domain-specific tokenizers also reduce memory requirements due to smaller vocabularies and shorter sequences. These results demonstrate that adapting the vocabulary of LMs to the clinical domain provides practical benefits, including improved performance and reduced computational demands, making such models more accessible and effective for both research and real-world healthcare settings.
CVMar 9, 2025
Continuous Online Adaptation Driven by User Interaction for Medical Image SegmentationWentian Xu, Ziyun Liang, Harry Anthony et al.
Interactive segmentation models use real-time user interactions, such as mouse clicks, as extra inputs to dynamically refine the model predictions. After model deployment, user corrections of model predictions could be used to adapt the model to the post-deployment data distribution, countering distribution-shift and enhancing reliability. Motivated by this, we introduce an online adaptation framework that enables an interactive segmentation model to continuously learn from user interaction and improve its performance on new data distributions, as it processes a sequence of test images. We introduce the Gaussian Point Loss function to train the model how to leverage user clicks, along with a two-stage online optimization method that adapts the model using the corrected predictions generated via user interactions. We demonstrate that this simple and therefore practical approach is very effective. Experiments on 5 fundus and 4 brain MRI databases demonstrate that our method outperforms existing approaches under various data distribution shifts, including segmentation of image modalities and pathologies not seen during training.
CVMay 30, 2023
Joint Optimization of Class-Specific Training- and Test-Time Data Augmentation in SegmentationZeju Li, Konstantinos Kamnitsas, Qi Dou et al.
This paper presents an effective and general data augmentation framework for medical image segmentation. We adopt a computationally efficient and data-efficient gradient-based meta-learning scheme to explicitly align the distribution of training and validation data which is used as a proxy for unseen test data. We improve the current data augmentation strategies with two core designs. First, we learn class-specific training-time data augmentation (TRA) effectively increasing the heterogeneity within the training subsets and tackling the class imbalance common in segmentation. Second, we jointly optimize TRA and test-time data augmentation (TEA), which are closely connected as both aim to align the training and test data distribution but were so far considered separately in previous works. We demonstrate the effectiveness of our method on four medical image segmentation tasks across different scenarios with two state-of-the-art segmentation models, DeepMedic and nnU-Net. Extensive experimentation shows that the proposed data augmentation framework can significantly and consistently improve the segmentation performance when compared to existing solutions. Code is publicly available.
CVJul 19, 2021
Transductive image segmentation: Self-training and effect of uncertainty estimationKonstantinos Kamnitsas, Stefan Winzeck, Evgenios N. Kornaropoulos et al.
Semi-supervised learning (SSL) uses unlabeled data during training to learn better models. Previous studies on SSL for medical image segmentation focused mostly on improving model generalization to unseen data. In some applications, however, our primary interest is not generalization but to obtain optimal predictions on a specific unlabeled database that is fully available during model development. Examples include population studies for extracting imaging phenotypes. This work investigates an often overlooked aspect of SSL, transduction. It focuses on the quality of predictions made on the unlabeled data of interest when they are included for optimization during training, rather than improving generalization. We focus on the self-training framework and explore its potential for transduction. We analyze it through the lens of Information Gain and reveal that learning benefits from the use of calibrated or under-confident models. Our extensive experiments on a large MRI database for multi-class segmentation of traumatic brain lesions shows promising results when comparing transductive with inductive predictions. We believe this study will inspire further research on transductive learning, a well-suited paradigm for medical image analysis.
CVJul 13, 2021
Learning from Partially Overlapping Labels: Image Segmentation under Annotation ShiftGregory Filbrandt, Konstantinos Kamnitsas, David Bernstein et al.
Scarcity of high quality annotated images remains a limiting factor for training accurate image segmentation models. While more and more annotated datasets become publicly available, the number of samples in each individual database is often small. Combining different databases to create larger amounts of training data is appealing yet challenging due to the heterogeneity as a result of differences in data acquisition and annotation processes, often yielding incompatible or even conflicting information. In this paper, we investigate and propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation. We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data and substantially improve segmentation accuracy compared to baseline and alternative approaches.
CVJul 6, 2021
Confidence-based Out-of-Distribution Detection: A Comparative Study and AnalysisChristoph Berger, Magdalini Paschali, Ben Glocker et al.
Image classification models deployed in the real world may receive inputs outside the intended data distribution. For critical applications such as clinical decision making, it is important that a model can detect such out-of-distribution (OOD) inputs and express its uncertainty. In this work, we assess the capability of various state-of-the-art approaches for confidence-based OOD detection through a comparative study and in-depth analysis. First, we leverage a computer vision benchmark to reproduce and compare multiple OOD detection methods. We then evaluate their capabilities on the challenging task of disease classification using chest X-rays. Our study shows that high performance in a computer vision task does not directly translate to accuracy in a medical imaging task. We analyse factors that affect performance of the methods between the two tasks. Our results provide useful insights for developing the next generation of OOD detection methods.
MLApr 28, 2021
Distributional Gaussian Process Layers for Outlier Detection in Image SegmentationSebastian G. Popescu, David J. Sharp, James H. Cole et al.
We propose a parameter efficient Bayesian layer for hierarchical convolutional Gaussian Processes that incorporates Gaussian Processes operating in Wasserstein-2 space to reliably propagate uncertainty. This directly replaces convolving Gaussian Processes with a distance-preserving affine operator on distributions. Our experiments on brain tissue-segmentation show that the resulting architecture approaches the performance of well-established deterministic segmentation algorithms (U-Net), which has never been achieved with previous hierarchical Gaussian Processes. Moreover, by applying the same segmentation model to out-of-distribution data (i.e., images with pathology such as brain tumors), we show that our uncertainty estimates result in out-of-distribution detection that outperforms the capabilities of previous Bayesian networks and reconstruction-based approaches that learn normative distributions.
CVFeb 20, 2021
Analyzing Overfitting under Class Imbalance in Neural Networks for Image SegmentationZeju Li, Konstantinos Kamnitsas, Ben Glocker
Class imbalance poses a challenge for developing unbiased, accurate predictive models. In particular, in image segmentation neural networks may overfit to the foreground samples from small structures, which are often heavily under-represented in the training set, leading to poor generalization. In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior. We find empirically that when training with limited data and strong class imbalance, at test time the distribution of logit activations may shift across the decision boundary, while samples of the well-represented class seem unaffected. This bias leads to a systematic under-segmentation of small structures. This phenomenon is consistently observed for different databases, tasks and network architectures. To tackle this problem, we introduce new asymmetric variants of popular loss functions and regularization techniques including a large margin loss, focal loss, adversarial training, mixup and data augmentation, which are explicitly designed to counter logit shift of the under-represented classes. Extensive experiments are conducted on several challenging segmentation tasks. Our results demonstrate that the proposed modifications to the objective function can lead to significantly improved segmentation accuracy compared to baselines and alternative approaches.
CVJun 10, 2020
Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric UncertaintyMiguel Monteiro, Loïc Le Folgoc, Daniel Coelho de Castro et al.
In image segmentation, there is often more than one plausible solution for a given input. In medical imaging, for example, experts will often disagree about the exact location of object boundaries. Estimating this inherent uncertainty and predicting multiple plausible hypotheses is of great interest in many applications, yet this ability is lacking in most current deep learning methods. In this paper, we introduce stochastic segmentation networks (SSNs), an efficient probabilistic method for modelling aleatoric uncertainty with any image segmentation network architecture. In contrast to approaches that produce pixel-wise estimates, SSNs model joint distributions over entire label maps and thus can generate multiple spatially coherent hypotheses for a single image. By using a low-rank multivariate normal distribution over the logit space to model the probability of the label map given the image, we obtain a spatially consistent probability distribution that can be efficiently computed by a neural network without any changes to the underlying architecture. We tested our method on the segmentation of real-world medical data, including lung nodules in 2D CT and brain tumours in 3D multimodal MRI scans. SSNs outperform state-of-the-art for modelling correlated uncertainty in ambiguous images while being much simpler, more flexible, and more efficient.
CVOct 29, 2019
Domain Generalization via Model-Agnostic Learning of Semantic FeaturesQi Dou, Daniel C. Castro, Konstantinos Kamnitsas et al.
Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.
LGJul 25, 2019
Overfitting of neural nets under class imbalance: Analysis and improvements for segmentationZeju Li, Konstantinos Kamnitsas, Ben Glocker
Overfitting in deep learning has been the focus of a number of recent works, yet its exact impact on the behavior of neural networks is not well understood. This study analyzes overfitting by examining how the distribution of logits alters in relation to how much the model overfits. Specifically, we find that when training with few data samples, the distribution of logit activations when processing unseen test samples of an under-represented class tends to shift towards and even across the decision boundary, while the over-represented class seems unaffected. In image segmentation, foreground samples are often heavily under-represented. We observe that sensitivity of the model drops as a result of overfitting, while precision remains mostly stable. Based on our analysis, we derive asymmetric modifications of existing loss functions and regularizers including a large margin loss, focal loss, adversarial training and mixup, which specifically aim at reducing the shift observed when embedding unseen samples of the under-represented class. We study the case of binary segmentation of brain tumor core and show that our proposed simple modifications lead to significantly improved segmentation performance over the symmetric variants.
IVJul 5, 2019
Data Efficient Unsupervised Domain Adaptation for Cross-Modality Image SegmentationCheng Ouyang, Konstantinos Kamnitsas, Carlo Biffi et al.
Deep learning models trained on medical images from a source domain (e.g. imaging modality) often fail when deployed on images from a different target domain, despite imaging common anatomical structures. Deep unsupervised domain adaptation (UDA) aims to improve the performance of a deep neural network model on a target domain, using solely unlabelled target domain data and labelled source domain data. However, current state-of-the-art methods exhibit reduced performance when target data is scarce. In this work, we introduce a new data efficient UDA method for multi-domain medical image segmentation. The proposed method combines a novel VAE-based feature prior matching, which is data-efficient, and domain adversarial training to learn a shared domain-invariant latent space which is exploited during segmentation. Our method is evaluated on a public multi-modality cardiac image segmentation dataset by adapting from the labelled source domain (3D MRI) to the unlabelled target domain (3D CT). We show that by using only one single unlabelled 3D CT scan, the proposed architecture outperforms the state-of-the-art in the same setting. Finally, we perform ablation studies on prior matching and domain adversarial training to shed light on the theoretical grounding of the proposed method.
CVJun 30, 2019
Multiple Landmark Detection using Multi-Agent Reinforcement LearningAthanasios Vlontzos, Amir Alansary, Konstantinos Kamnitsas et al.
The detection of anatomical landmarks is a vital step for medical image analysis and applications for diagnosis, interpretation and guidance. Manual annotation of landmarks is a tedious process that requires domain-specific expertise and introduces inter-observer variability. This paper proposes a new detection approach for multiple landmarks based on multi-agent reinforcement learning. Our hypothesis is that the position of all anatomical landmarks is interdependent and non-random within the human anatomy, thus finding one landmark can help to deduce the location of others. Using a Deep Q-Network (DQN) architecture we construct an environment and agent with implicit inter-communication such that we can accommodate K agents acting and learning simultaneously, while they attempt to detect K different landmarks. During training the agents collaborate by sharing their accumulated knowledge for a collective gain. We compare our approach with state-of-the-art architectures and achieve significantly better accuracy by reducing the detection error by 50%, while requiring fewer computational resources and time to train compared to the naive approach of training K agents separately.
IVJun 28, 2019
Explainable Anatomical Shape Analysis through Deep Hierarchical Generative ModelsCarlo Biffi, Juan J. Cerrolaza, Giacomo Tarroni et al.
Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack interpretability in the feature extraction and decision processes. In this work, we propose a new interpretable deep learning model for shape analysis. In particular, we exploit deep generative networks to model a population of anatomical segmentations through a hierarchy of conditional latent variables. At the highest level of this hierarchy, a two-dimensional latent space is simultaneously optimised to discriminate distinct clinical conditions, enabling the direct visualisation of the classification space. Moreover, the anatomical variability encoded by this discriminative latent space can be visualised in the segmentation space thanks to the generative properties of the model, making the classification task transparent. This approach yielded high accuracy in the categorisation of healthy and remodelled left ventricles when tested on unseen segmentations from our own multi-centre dataset as well as in an external validation set, and on hippocampi from healthy controls and patients with Alzheimer's disease when tested on ADNI data. More importantly, it enabled the visualisation in three-dimensions of both global and regional anatomical features which better discriminate between the conditions under exam. The proposed approach scales effectively to large populations, facilitating high-throughput analysis of normal anatomy and pathology in large-scale studies of volumetric imaging.
CVNov 6, 2018
Towards continual learning in medical imagingChaitanya Baweja, Ben Glocker, Konstantinos Kamnitsas
This work investigates continual learning of two segmentation tasks in brain MRI with neural networks. To explore in this context the capabilities of current methods for countering catastrophic forgetting of the first task when a new one is learned, we investigate elastic weight consolidation, a recently proposed method based on Fisher information, originally evaluated on reinforcement learning of Atari games. We use it to sequentially learn segmentation of normal brain structures and then segmentation of white matter lesions. Our findings show this recent method reduces catastrophic forgetting, while large room for improvement exists in these challenging settings for continual learning.
CVNov 5, 2018
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS ChallengeSpyridon Bakas, Mauricio Reyes, Andras Jakab et al.
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
CVOct 24, 2018
Generative adversarial networks and adversarial methods in biomedical image analysisJelmer M. Wolterink, Konstantinos Kamnitsas, Christian Ledig et al.
Generative adversarial networks (GANs) and other adversarial methods are based on a game-theoretical perspective on joint optimization of two neural networks as players in a game. Adversarial techniques have been extensively used to synthesize and analyze biomedical images. We provide an introduction to GANs and adversarial methods, with an overview of biomedical image analysis tasks that have benefited from such methods. We conclude with a discussion of strengths and limitations of adversarial methods in biomedical image analysis, and propose potential future research directions.
CVJun 8, 2018
Automatic View Planning with Multi-scale Deep Reinforcement Learning AgentsAmir Alansary, Loic Le Folgoc, Ghislain Vaillant et al.
We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is tedious and operator-dependent. For this task, we employ a multi-scale reinforcement learning (RL) agent framework and extensively evaluate several Deep Q-Network (DQN) based strategies. RL enables a natural learning paradigm by interaction with the environment, which can be used to mimic experienced operators. We evaluate our results using the distance between the anatomical landmarks and detected planes, and the angles between their normal vector and target. The proposed algorithm is assessed on the mid-sagittal and anterior-posterior commissure planes of brain MRI, and the 4-chamber long-axis plane commonly used in cardiac MRI, achieving accuracy of 1.53mm, 1.98mm and 4.84mm, respectively.
LGJun 7, 2018
Semi-Supervised Learning via Compact Latent Space ClusteringKonstantinos Kamnitsas, Daniel C. Castro, Loic Le Folgoc et al.
We present a novel cost function for semi-supervised learning of neural networks that encourages compact clustering of the latent space to facilitate separation. The key idea is to dynamically create a graph over embeddings of labeled and unlabeled samples of a training batch to capture underlying structure in feature space, and use label propagation to estimate its high and low density regions. We then devise a cost function based on Markov chains on the graph that regularizes the latent space to form a single compact cluster per class, while avoiding to disturb existing clusters during optimization. We evaluate our approach on three benchmarks and compare to state-of-the art with promising results. Our approach combines the benefits of graph-based regularization with efficient, inductive inference, does not require modifications to a network architecture, and can thus be easily applied to existing networks to enable an effective use of unlabeled data.
CVJun 1, 2018
Domain Adaptation for MRI Organ Segmentation using Reverse Classification AccuracyVanya V. Valindria, Ioannis Lavdas, Wenjia Bai et al.
The variations in multi-center data in medical imaging studies have brought the necessity of domain adaptation. Despite the advancement of machine learning in automatic segmentation, performance often degrades when algorithms are applied on new data acquired from different scanners or sequences than the training data. Manual annotation is costly and time consuming if it has to be carried out for every new target domain. In this work, we investigate automatic selection of suitable subjects to be annotated for supervised domain adaptation using the concept of reverse classification accuracy (RCA). RCA predicts the performance of a trained model on data from the new domain and different strategies of selecting subjects to be included in the adaptation via transfer learning are evaluated. We perform experiments on a two-center MR database for the task of organ segmentation. We show that subject selection via RCA can reduce the burden of annotation of new data for the target domain.
CVMay 22, 2018
Autofocus Layer for Semantic SegmentationYao Qin, Konstantinos Kamnitsas, Siddharth Ancha et al.
We propose the autofocus convolutional layer for semantic segmentation with the objective of enhancing the capabilities of neural networks for multi-scale processing. Autofocus layers adaptively change the size of the effective receptive field based on the processed context to generate more powerful features. This is achieved by parallelising multiple convolutional layers with different dilation rates, combined by an attention mechanism that learns to focus on the optimal scales driven by context. By sharing the weights of the parallel convolutions we make the network scale-invariant, with only a modest increase in the number of parameters. The proposed autofocus layer can be easily integrated into existing networks to improve a model's representational power. We evaluate our models on the challenging tasks of multi-organ segmentation in pelvic CT and brain tumor segmentation in MRI and achieve very promising performance.
CVNov 4, 2017
Ensembles of Multiple Models and Architectures for Robust Brain Tumour SegmentationKonstantinos Kamnitsas, Wenjia Bai, Enzo Ferrante et al.
Deep learning approaches such as convolutional neural nets have consistently outperformed previous methods on challenging tasks such as dense, semantic segmentation. However, the various proposed networks perform differently, with behaviour largely influenced by architectural choices and training settings. This paper explores Ensembles of Multiple Models and Architectures (EMMA) for robust performance through aggregation of predictions from a wide range of methods. The approach reduces the influence of the meta-parameters of individual models and the risk of overfitting the configuration to a particular database. EMMA can be seen as an unbiased, generic deep learning model which is shown to yield excellent performance, winning the first position in the BRATS 2017 competition among 50+ participating teams.
CVMay 22, 2017
Anatomically Constrained Neural Networks (ACNN): Application to Cardiac Image Enhancement and SegmentationOzan Oktay, Enzo Ferrante, Konstantinos Kamnitsas et al.
Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches. In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition. The highly constrained nature of anatomical objects can be well captured with learning based techniques. However, in most recent and promising techniques such as CNN based segmentation it is not obvious how to incorporate such prior knowledge. State-of-the-art methods operate as pixel-wise classifiers where the training objectives do not incorporate the structure and inter-dependencies of the output. To overcome this limitation, we propose a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularisation model, which is trained end-to-end. The new framework encourages models to follow the global anatomical properties of the underlying anatomy (e.g. shape, label structure) via learned non-linear representations of the shape. We show that the proposed approach can be easily adapted to different analysis tasks (e.g. image enhancement, segmentation) and improve the prediction accuracy of the state-of-the-art models. The applicability of our approach is shown on multi-modal cardiac datasets and public benchmarks. Additionally, we demonstrate how the learned deep models of 3D shapes can be interpreted and used as biomarkers for classification of cardiac pathologies.
CVFeb 28, 2017
Context-Sensitive Super-Resolution for Fast Fetal Magnetic Resonance ImagingSteven McDonagh, Benjamin Hou, Konstantinos Kamnitsas et al.
3D Magnetic Resonance Imaging (MRI) is often a trade-off between fast but low-resolution image acquisition and highly detailed but slow image acquisition. Fast imaging is required for targets that move to avoid motion artefacts. This is in particular difficult for fetal MRI. Spatially independent upsampling techniques, which are the state-of-the-art to address this problem, are error prone and disregard contextual information. In this paper we propose a context-sensitive upsampling method based on a residual convolutional neural network model that learns organ specific appearance and adopts semantically to input data allowing for the generation of high resolution images with sharp edges and fine scale detail. By making contextual decisions about appearance and shape, present in different parts of an image, we gain a maximum of structural detail at a similar contrast as provided by high-resolution data. We experiment on $145$ fetal scans and show that our approach yields an increased PSNR of $1.25$ $dB$ when applied to under-sampled fetal data \emph{cf.} baseline upsampling. Furthermore, our method yields an increased PSNR of $1.73$ $dB$ when utilizing under-sampled fetal data to perform brain volume reconstruction on motion corrupted captured data.
CVFeb 11, 2017
Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground TruthVanya V. Valindria, Ioannis Lavdas, Wenjia Bai et al.
When integrating computational tools such as automatic segmentation into clinical practice, it is of utmost importance to be able to assess the level of accuracy on new data, and in particular, to detect when an automatic method fails. However, this is difficult to achieve due to absence of ground truth. Segmentation accuracy on clinical data might be different from what is found through cross-validation because validation data is often used during incremental method development, which can lead to overfitting and unrealistic performance expectations. Before deployment, performance is quantified using different metrics, for which the predicted segmentation is compared to a reference segmentation, often obtained manually by an expert. But little is known about the real performance after deployment when a reference is unavailable. In this paper, we introduce the concept of reverse classification accuracy (RCA) as a framework for predicting the performance of a segmentation method on new data. In RCA we take the predicted segmentation from a new image to train a reverse classifier which is evaluated on a set of reference images with available ground truth. The hypothesis is that if the predicted segmentation is of good quality, then the reverse classifier will perform well on at least some of the reference images. We validate our approach on multi-organ segmentation with different classifiers and segmentation methods. Our results indicate that it is indeed possible to predict the quality of individual segmentations, in the absence of ground truth. Thus, RCA is ideal for integration into automatic processing pipelines in clinical routine and as part of large-scale image analysis studies.
CVDec 28, 2016
Unsupervised domain adaptation in brain lesion segmentation with adversarial networksKonstantinos Kamnitsas, Christian Baumgartner, Christian Ledig et al.
Significant advances have been made towards building accurate automatic segmentation systems for a variety of biomedical applications using machine learning. However, the performance of these systems often degrades when they are applied on new data that differ from the training data, for example, due to variations in imaging protocols. Manually annotating new data for each test domain is not a feasible solution. In this work we investigate unsupervised domain adaptation using adversarial neural networks to train a segmentation method which is more invariant to differences in the input data, and which does not require any annotations on the test domain. Specifically, we learn domain-invariant features by learning to counter an adversarial network, which attempts to classify the domain of the input data by observing the activations of the segmentation network. Furthermore, we propose a multi-connected domain discriminator for improved adversarial training. Our system is evaluated using two MR databases of subjects with traumatic brain injuries, acquired using different scanners and imaging protocols. Using our unsupervised approach, we obtain segmentation accuracies which are close to the upper bound of supervised domain adaptation.
CVDec 16, 2016
SonoNet: Real-Time Detection and Localisation of Fetal Standard Scan Planes in Freehand UltrasoundChristian F. Baumgartner, Konstantinos Kamnitsas, Jacqueline Matthew et al.
Identifying and interpreting fetal standard scan planes during 2D ultrasound mid-pregnancy examinations are highly complex tasks which require years of training. Apart from guiding the probe to the correct location, it can be equally difficult for a non-expert to identify relevant structures within the image. Automatic image processing can provide tools to help experienced as well as inexperienced operators with these tasks. In this paper, we propose a novel method based on convolutional neural networks which can automatically detect 13 fetal standard views in freehand 2D ultrasound data as well as provide a localisation of the fetal structures via a bounding box. An important contribution is that the network learns to localise the target anatomy using weak supervision based on image-level labels only. The network architecture is designed to operate in real-time while providing optimal output for the localisation task. We present results for real-time annotation, retrospective frame retrieval from saved videos, and localisation on a very large and challenging dataset consisting of images and video recordings of full clinical anomaly screenings. We found that the proposed method achieved an average F1-score of 0.798 in a realistic classification experiment modelling real-time detection, and obtained a 90.09% accuracy for retrospective frame retrieval. Moreover, an accuracy of 77.8% was achieved on the localisation task.