Sébastien Ourselin

CV
h-index69
38papers
6,489citations
Novelty47%
AI Score47

38 Papers

CVSep 19, 2023
UPL-SFDA: Uncertainty-aware Pseudo Label Guided Source-Free Domain Adaptation for Medical Image Segmentation

Jianghao Wu, Guotai Wang, Ran Gu et al.

Domain Adaptation (DA) is important for deep learning-based medical image segmentation models to deal with testing images from a new target domain. As the source-domain data are usually unavailable when a trained model is deployed at a new center, Source-Free Domain Adaptation (SFDA) is appealing for data and annotation-efficient adaptation to the target domain. However, existing SFDA methods have a limited performance due to lack of sufficient supervision with source-domain images unavailable and target-domain images unlabeled. We propose a novel Uncertainty-aware Pseudo Label guided (UPL) SFDA method for medical image segmentation. Specifically, we propose Target Domain Growing (TDG) to enhance the diversity of predictions in the target domain by duplicating the pre-trained model's prediction head multiple times with perturbations. The different predictions in these duplicated heads are used to obtain pseudo labels for unlabeled target-domain images and their uncertainty to identify reliable pseudo labels. We also propose a Twice Forward pass Supervision (TFS) strategy that uses reliable pseudo labels obtained in one forward pass to supervise predictions in the next forward pass. The adaptation is further regularized by a mean prediction-based entropy minimization term that encourages confident and consistent results in different prediction heads. UPL-SFDA was validated with a multi-site heart MRI segmentation dataset, a cross-modality fetal brain segmentation dataset, and a 3D fetal tissue segmentation dataset. It improved the average Dice by 5.54, 5.01 and 6.89 percentage points for the three tasks compared with the baseline, respectively, and outperformed several state-of-the-art SFDA methods.

IVApr 5, 2022
A Dempster-Shafer approach to trustworthy AI with application to fetal brain MRI segmentation

Lucas Fidon, Michael Aertsen, Florian Kofler et al.

Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on artificial intelligence (AI). In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of a state-of-the-art backbone AI for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities.

CVDec 3, 2025
An Automated Framework for Large-Scale Graph-Based Cerebrovascular Analysis

Daniele Falcetta, Liane S. Canas, Lorenzo Suppa et al.

We present CaravelMetrics, a computational framework for automated cerebrovascular analysis that models vessel morphology through skeletonization-derived graph representations. The framework integrates atlas-based regional parcellation, centerline extraction, and graph construction to compute fifteen morphometric, topological, fractal, and geometric features. The features can be estimated globally from the complete vascular network or regionally within arterial territories, enabling multiscale characterization of cerebrovascular organization. Applied to 570 3D TOF-MRA scans from the IXI dataset (ages 20-86), CaravelMetrics yields reproducible vessel graphs capturing age- and sex-related variations and education-associated increases in vascular complexity, consistent with findings reported in the literature. The framework provides a scalable and fully automated approach for quantitative cerebrovascular feature extraction, supporting normative modeling and population-level studies of vascular health and aging.

IVMar 27, 2024Code
Generative Medical Segmentation

Jiayu Huo, Xi Ouyang, Sébastien Ourselin et al.

Rapid advancements in medical image segmentation performance have been significantly driven by the development of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models follow the discriminative pixel-wise classification learning paradigm and often have limited ability to generalize across diverse medical imaging datasets. In this manuscript, we introduce Generative Medical Segmentation (GMS), a novel approach leveraging a generative model to perform image segmentation. Concretely, GMS employs a robust pre-trained vision foundation model to extract latent representations for images and corresponding ground truth masks, followed by a model that learns a mapping function from the image to the mask in the latent space. Once trained, the model generates an estimated segmentation mask using the pre-trained vision foundation model to decode the predicted latent representation back into the image space. The design of GMS leads to fewer trainable parameters in the model which reduces the risk of overfitting and enhances its generalization capability. Our experimental analysis across five public datasets in different medical imaging domains demonstrates GMS outperforms existing discriminative and generative segmentation models. Furthermore, GMS is able to generalize well across datasets from different centers within the same imaging modality. Our experiments suggest GMS offers a scalable and effective solution for medical image segmentation. GMS implementation and trained model weights are available at https://github.com/King-HAW/GMS.

ROApr 29, 2025Code
Hydra: Marker-Free RGB-D Hand-Eye Calibration

Martin Huber, Huanyu Tian, Christopher E. Mower et al.

This work presents an RGB-D imaging-based approach to marker-free hand-eye calibration using a novel implementation of the iterative closest point (ICP) algorithm with a robust point-to-plane (PTP) objective formulated on a Lie algebra. Its applicability is demonstrated through comprehensive experiments using three well known serial manipulators and two RGB-D cameras. With only three randomly chosen robot configurations, our approach achieves approximately 90% successful calibrations, demonstrating 2-3x higher convergence rates to the global optimum compared to both marker-based and marker-free baselines. We also report 2 orders of magnitude faster convergence time (0.8 +/- 0.4 s) for 9 robot configurations over other marker-free methods. Our method exhibits significantly improved accuracy (5 mm in task space) over classical approaches (7 mm in task space) whilst being marker-free. The benchmarking dataset and code are open sourced under Apache 2.0 License, and a ROS 2 integration with robot abstraction is provided to facilitate deployment.

IVDec 24, 2021Code
Generalized Wasserstein Dice Loss, Test-time Augmentation, and Transformers for the BraTS 2021 challenge

Lucas Fidon, Suprosanna Shit, Ivan Ezhov et al.

Brain tumor segmentation from multiple Magnetic Resonance Imaging (MRI) modalities is a challenging task in medical image computation. The main challenges lie in the generalizability to a variety of scanners and imaging protocols. In this paper, we explore strategies to increase model robustness without increasing inference time. Towards this aim, we explore finding a robust ensemble from models trained using different losses, optimizers, and train-validation data split. Importantly, we explore the inclusion of a transformer in the bottleneck of the U-Net architecture. While we find transformer in the bottleneck performs slightly worse than the baseline U-Net in average, the generalized Wasserstein Dice loss consistently produces superior results. Further, we adopt an efficient test time augmentation strategy for faster and robust inference. Our final ensemble of seven 3D U-Nets with test-time augmentation produces an average dice score of 89.4% and an average Hausdorff 95% distance of 10.0 mm when evaluated on the BraTS 2021 testing dataset. Our code and trained models are publicly available at https://github.com/LucasFidon/TRABIT_BraTS2021.

IVJul 8, 2021Code
Label-set Loss Functions for Partial Supervision: Application to Fetal Brain 3D MRI Parcellation

Lucas Fidon, Michael Aertsen, Doaa Emam et al.

Deep neural networks have increased the accuracy of automatic segmentation, however, their accuracy depends on the availability of a large number of fully segmented images. Methods to train deep neural networks using images for which some, but not all, regions of interest are segmented are necessary to make better use of partially annotated datasets. In this paper, we propose the first axiomatic definition of label-set loss functions that are the loss functions that can handle partially segmented images. We prove that there is one and only one method to convert a classical loss function for fully segmented images into a proper label-set loss function. Our theory also allows us to define the leaf-Dice loss, a label-set generalization of the Dice loss particularly suited for partial supervision with only missing labels. Using the leaf-Dice loss, we set a new state of the art in partially supervised learning for fetal brain 3D MRI segmentation. We achieve a deep neural network able to segment white matter, ventricles, cerebellum, extra-ventricular CSF, cortical gray matter, deep gray matter, brainstem, and corpus callosum based on fetal brain 3D MRI of anatomically normal fetuses or with open spina bifida. Our implementation of the proposed label-set loss functions is available at https://github.com/LucasFidon/label-set-loss-functions

IVMay 24, 2021Code
A self-supervised learning strategy for postoperative brain cavity segmentation simulating resections

Fernando Pérez-García, Reuben Dorent, Michele Rizzi et al.

Accurate segmentation of brain resection cavities (RCs) aids in postoperative analysis and determining follow-up treatment. Convolutional neural networks (CNNs) are the state-of-the-art image segmentation technique, but require large annotated datasets for training. Annotation of 3D medical images is time-consuming, requires highly-trained raters, and may suffer from high inter-rater variability. Self-supervised learning strategies can leverage unlabeled data for training. We developed an algorithm to simulate resections from preoperative magnetic resonance images (MRIs). We performed self-supervised training of a 3D CNN for RC segmentation using our simulation method. We curated EPISURG, a dataset comprising 430 postoperative and 268 preoperative MRIs from 430 refractory epilepsy patients who underwent resective neurosurgery. We fine-tuned our model on three small annotated datasets from different institutions and on the annotated images in EPISURG, comprising 20, 33, 19 and 133 subjects. The model trained on data with simulated resections obtained median (interquartile range) Dice score coefficients (DSCs) of 81.7 (16.4), 82.4 (36.4), 74.9 (24.2) and 80.5 (18.7) for each of the four datasets. After fine-tuning, DSCs were 89.2 (13.3), 84.1 (19.8), 80.2 (20.1) and 85.2 (10.8). For comparison, inter-rater agreement between human annotators from our previous study was 84.0 (9.9). We present a self-supervised learning strategy for 3D CNNs using simulated RCs to accurately segment real RCs on postoperative MRI. Our method generalizes well to data from different institutions, pathologies and modalities. Source code, segmentation models and the EPISURG dataset are available at https://github.com/fepegar/ressegijcars .

CVApr 12, 2021Code
An MRF-UNet Product of Experts for Image Segmentation

Mikael Brudfors, Yaël Balbastre, John Ashburner et al.

While convolutional neural networks (CNNs) trained by back-propagation have seen unprecedented success at semantic segmentation tasks, they are known to struggle on out-of-distribution data. Markov random fields (MRFs) on the other hand, encode simpler distributions over labels that, although less flexible than UNets, are less prone to over-fitting. In this paper, we propose to fuse both strategies by computing the product of distributions of a UNet and an MRF. As this product is intractable, we solve for an approximate distribution using an iterative mean-field approach. The resulting MRF-UNet is trained jointly by back-propagation. Compared to other works using conditional random fields (CRFs), the MRF has no dependency on the imaging data, which should allow for less over-fitting. We show on 3D neuroimaging data that this novel network improves generalisation to out-of-distribution samples. Furthermore, it allows the overall number of parameters to be reduced while preserving high accuracy. These results suggest that a classic MRF smoothness prior can allow for less over-fitting when principally integrated into a CNN model. Our implementation is available at https://github.com/balbasty/nitorch.

IVMar 21, 2021Code
MONAIfbs: MONAI-based fetal brain MRI deep learning segmentation

Marta B. M. Ranzini, Lucas Fidon, Sébastien Ourselin et al.

In fetal Magnetic Resonance Imaging, Super Resolution Reconstruction (SRR) algorithms are becoming popular tools to obtain high-resolution 3D volume reconstructions from low-resolution stacks of 2D slices, acquired at different orientations. To be effective, these algorithms often require accurate segmentation of the region of interest, such as the fetal brain in suspected pathological cases. In the case of Spina Bifida, Ebner, Wang et al. (NeuroImage, 2020) combined their SRR algorithm with a 2-step segmentation pipeline (2D localisation followed by a 2D segmentation network). However, if the localisation step fails, the second network is not able to recover a correct brain mask, thus requiring manual corrections for an effective SRR. In this work, we aim at improving the fetal brain segmentation for SRR in Spina Bifida. We hypothesise that a well-trained single-step UNet can achieve accurate performance, avoiding the need of a 2-step approach. We propose a new tool for fetal brain segmentation called MONAIfbs, which takes advantage of the Medical Open Network for Artificial Intelligence (MONAI) framework. Our network is based on the dynamic UNet (dynUNet), an adaptation of the nnU-Net framework. When compared to the original 2-step approach proposed in Ebner-Wang, and the same Ebner-Wang approach retrained with the expanded dataset available for this work, the dynUNet showed to achieve higher performance using a single step only. It also showed to reduce the number of outliers, as only 28 stacks obtained Dice score less than 0.9, compared to 68 for Ebner-Wang and 53 Ebner-Wang expanded. The proposed dynUNet model thus provides an improvement of the state-of-the-art fetal brain segmentation techniques, reducing the need for manual correction in automated SRR pipelines. Our code and our trained model are made publicly available at https://github.com/gift-surg/MONAIfbs.

IVSep 22, 2020Code
CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation

Ran Gu, Guotai Wang, Tao Song et al.

Accurate medical image segmentation is essential for diagnosis and treatment planning of diseases. Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they are still challenged by complicated conditions where the segmentation target has large variations of position, shape and scale, and existing CNNs have a poor explainability that limits their application to clinical decisions. In this work, we make extensive use of multiple attentions in a CNN architecture and propose a comprehensive attention-based CNN (CA-Net) for more accurate and explainable medical image segmentation that is aware of the most important spatial positions, channels and scales at the same time. In particular, we first propose a joint spatial attention module to make the network focus more on the foreground region. Then, a novel channel attention module is proposed to adaptively recalibrate channel-wise feature responses and highlight the most relevant feature channels. Also, we propose a scale attention module implicitly emphasizing the most salient feature maps among multiple scales so that the CNN is adaptive to the size of an object. Extensive experiments on skin lesion segmentation from ISIC 2018 and multi-class segmentation of fetal MRI found that our proposed CA-Net significantly improved the average segmentation Dice score from 87.77% to 92.08% for skin lesion, 84.79% to 87.08% for the placenta and 93.20% to 95.88% for the fetal brain respectively compared with U-Net. It reduced the model size to around 15 times smaller with close or even better accuracy compared with state-of-the-art DeepLabv3+. In addition, it has a much higher explainability than existing networks by visualizing the attention weight maps. Our code is available at https://github.com/HiLab-git/CA-Net

IVMar 9, 2020Code
TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning

Fernando Pérez-García, Rachel Sparks, Sébastien Ourselin

Processing of medical images such as MRI or CT presents unique challenges compared to RGB images typically used in computer vision. These include a lack of labels for large datasets, high computational costs, and metadata to describe the physical properties of voxels. Data augmentation is used to artificially increase the size of the training datasets. Training with image patches decreases the need for computational power. Spatial metadata needs to be carefully taken into account in order to ensure a correct alignment of volumes. We present TorchIO, an open-source Python library to enable efficient loading, preprocessing, augmentation and patch-based sampling of medical images for deep learning. TorchIO follows the style of PyTorch and integrates standard medical image processing libraries to efficiently process images during training of neural networks. TorchIO transforms can be composed, reproduced, traced and extended. We provide multiple generic preprocessing and augmentation operations as well as simulation of MRI-specific artifacts. Source code, comprehensive tutorials and extensive documentation for TorchIO can be found at https://torchio.rtfd.io/. The package can be installed from the Python Package Index running 'pip install torchio'. It includes a command-line interface which allows users to apply transforms to image files without using Python. Additionally, we provide a graphical interface within a TorchIO extension in 3D Slicer to visualize the effects of transforms. TorchIO was developed to help researchers standardize medical image processing pipelines and allow them to focus on the deep learning experiments. It encourages open science, as it supports reproducibility and is version controlled so that the software can be cited precisely. Due to its modularity, the library is compatible with other frameworks for deep learning with medical images.

LGJan 8, 2020Code
Distributionally Robust Deep Learning using Hardness Weighted Sampling

Lucas Fidon, Michael Aertsen, Thomas Deprest et al.

Limiting failures of machine learning systems is of paramount importance for safety-critical applications. In order to improve the robustness of machine learning systems, Distributionally Robust Optimization (DRO) has been proposed as a generalization of Empirical Risk Minimization (ERM). However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO in comparison to the wide-spread variants of Stochastic Gradient Descent (SGD) optimizers for ERM. We propose SGD with hardness weighted sampling, a principled and efficient optimization method for DRO in machine learning that is particularly suited in the context of deep learning. Similar to a hard example mining strategy in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning, requiring minimal overhead computation. In contrast to typical ad hoc hard mining approaches, we prove the convergence of our DRO algorithm for over-parameterized deep learning networks with ReLU activation and a finite number of layers and parameters. Our experiments on fetal brain 3D MRI segmentation and brain tumor segmentation in MRI demonstrate the feasibility and the usefulness of our approach. Using our hardness weighted sampling for training a state-of-the-art deep learning pipeline leads to improved robustness to anatomical variabilities in automatic fetal brain 3D MRI segmentation using deep learning and to improved robustness to the image protocol variations in brain tumor segmentation. Our code is available at https://github.com/LucasFidon/HardnessWeightedSampler.

CVSep 11, 2017Code
NiftyNet: a deep-learning platform for medical imaging

Eli Gibson, Wenqi Li, Carole Sudre et al.

Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this application requires substantial implementation effort. Thus, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. NiftyNet provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on TensorFlow and supports TensorBoard visualization of 2D and 3D images and computational graphs by default. We present 3 illustrative medical image analysis applications built using NiftyNet: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. NiftyNet enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications.

ROOct 27, 2025
Localising under the drape: proprioception in the era of distributed surgical robotic system

Martin Huber, Nicola A. Cavalcanti, Ayoob Davoodi et al.

Despite their mechanical sophistication, surgical robots remain blind to their surroundings. This lack of spatial awareness causes collisions, system recoveries, and workflow disruptions, issues that will intensify with the introduction of distributed robots with independent interacting arms. Existing tracking systems rely on bulky infrared cameras and reflective markers, providing only limited views of the surgical scene and adding hardware burden in crowded operating rooms. We present a marker-free proprioception method that enables precise localisation of surgical robots under their sterile draping despite associated obstruction of visual cues. Our method solely relies on lightweight stereo-RGB cameras and novel transformer-based deep learning models. It builds on the largest multi-centre spatial robotic surgery dataset to date (1.4M self-annotated images from human cadaveric and preclinical in vivo studies). By tracking the entire robot and surgical scene, rather than individual markers, our approach provides a holistic view robust to occlusions, supporting surgical scene understanding and context-aware control. We demonstrate an example of potential clinical benefits during in vivo breathing compensation with access to tissue dynamics, unobservable under state of the art tracking, and accurately locate in multi-robot systems for future intelligent interaction. In addition, and compared with existing systems, our method eliminates markers and improves tracking visibility by 25%. To our knowledge, this is the first demonstration of marker-free proprioception for fully draped surgical robots, reducing setup complexity, enhancing safety, and paving the way toward modular and autonomous robotic surgery.

IVNov 3, 2021
Partial supervision for the FeTA challenge 2021

Lucas Fidon, Michael Aertsen, Suprosanna Shit et al.

This paper describes our method for our participation in the FeTA challenge2021 (team name: TRABIT). The performance of convolutional neural networks for medical image segmentation is thought to correlate positively with the number of training data. The FeTA challenge does not restrict participants to using only the provided training data but also allows for using other publicly available sources. Yet, open access fetal brain data remains limited. An advantageous strategy could thus be to expand the training data to cover broader perinatal brain imaging sources. Perinatal brain MRIs, other than the FeTA challenge data, that are currently publicly available, span normal and pathological fetal atlases as well as neonatal scans. However, perinatal brain MRIs segmented in different datasets typically come with different annotation protocols. This makes it challenging to combine those datasets to train a deep neural network. We recently proposed a family of loss functions, the label-set loss functions, for partially supervised learning. Label-set loss functions allow to train deep neural networks with partially segmented images, i.e. segmentations in which some classes may be grouped into super-classes. We propose to use label-set loss functions to improve the segmentation performance of a state-of-the-art deep learning pipeline for multi-class fetal brain segmentation by merging several publicly available datasets. To promote generalisability, our approach does not introduce any additional hyper-parameters tuning.

IVSep 30, 2021
Deep Homography Estimation in Dynamic Surgical Scenes for Laparoscopic Camera Motion Extraction

Martin Huber, Sébastien Ourselin, Christos Bergeles et al.

Current laparoscopic camera motion automation relies on rule-based approaches or only focuses on surgical tools. Imitation Learning (IL) methods could alleviate these shortcomings, but have so far been applied to oversimplified setups. Instead of extracting actions from oversimplified setups, in this work we introduce a method that allows to extract a laparoscope holder's actions from videos of laparoscopic interventions. We synthetically add camera motion to a newly acquired dataset of camera motion free da Vinci surgery image sequences through a novel homography generation algorithm. The synthetic camera motion serves as a supervisory signal for camera motion estimation that is invariant to object and tool motion. We perform an extensive evaluation of state-of-the-art (SOTA) Deep Neural Networks (DNNs) across multiple compute regimes, finding our method transfers from our camera motion free da Vinci surgery dataset to videos of laparoscopic interventions, outperforming classical homography estimation approaches in both, precision by 41%, and runtime on a CPU by 43%.

CVAug 9, 2021
Distributionally Robust Segmentation of Abnormal Fetal Brain 3D MRI

Lucas Fidon, Michael Aertsen, Nada Mufti et al.

The performance of deep neural networks typically increases with the number of training images. However, not all images have the same importance towards improved performance and robustness. In fetal brain MRI, abnormalities exacerbate the variability of the developing brain anatomy compared to non-pathological cases. A small number of abnormal cases, as is typically available in clinical datasets used for training, are unlikely to fairly represent the rich variability of abnormal developing brains. This leads machine learning systems trained by maximizing the average performance to be biased toward non-pathological cases. This problem was recently referred to as hidden stratification. To be suited for clinical use, automatic segmentation methods need to reliably achieve high-quality segmentation outcomes also for pathological cases. In this paper, we show that the state-of-the-art deep learning pipeline nnU-Net has difficulties to generalize to unseen abnormal cases. To mitigate this problem, we propose to train a deep neural network to minimize a percentile of the distribution of per-volume loss over the dataset. We show that this can be achieved by using Distributionally Robust Optimization (DRO). DRO automatically reweights the training samples with lower performance, encouraging nnU-Net to perform more consistently on all cases. We validated our approach using a dataset of 368 fetal brain T2w MRIs, including 124 MRIs of open spina bifida cases and 51 MRIs of cases with other severe abnormalities of brain development.

CVJun 22, 2021
Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures

Fernando Pérez-García, Catherine Scott, Rachel Sparks et al.

Detailed analysis of seizure semiology, the symptoms and signs which occur during a seizure, is critical for management of epilepsy patients. Inter-rater reliability using qualitative visual analysis is often poor for semiological features. Therefore, automatic and quantitative analysis of video-recorded seizures is needed for objective assessment. We present GESTURES, a novel architecture combining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to learn deep representations of arbitrarily long videos of epileptic seizures. We use a spatiotemporal CNN (STCNN) pre-trained on large human action recognition (HAR) datasets to extract features from short snippets (approx. 0.5 s) sampled from seizure videos. We then train an RNN to learn seizure-level representations from the sequence of features. We curated a dataset of seizure videos from 68 patients and evaluated GESTURES on its ability to classify seizures into focal onset seizures (FOSs) (N = 106) vs. focal to bilateral tonic-clonic seizures (TCSs) (N = 77), obtaining an accuracy of 98.9% using bidirectional long short-term memory (BLSTM) units. We demonstrate that an STCNN trained on a HAR dataset can be used in combination with an RNN to accurately represent arbitrarily long videos of seizures. GESTURES can provide accurate seizure classification by modeling sequences of semiologies.

LGApr 8, 2021
CARRNN: A Continuous Autoregressive Recurrent Neural Network for Deep Representation Learning from Sporadic Temporal Data

Mostafa Mehdipour Ghazi, Lauge Sørensen, Sébastien Ourselin et al.

Learning temporal patterns from multivariate longitudinal data is challenging especially in cases when data is sporadic, as often seen in, e.g., healthcare applications where the data can suffer from irregularity and asynchronicity as the time between consecutive data points can vary across features and samples, hindering the application of existing deep learning models that are constructed for complete, evenly spaced data with fixed sequence lengths. In this paper, a novel deep learning-based model is developed for modeling multiple temporal features in sporadic data using an integrated deep learning architecture based on a recurrent neural network (RNN) unit and a continuous-time autoregressive (CAR) model. The proposed model, called CARRNN, uses a generalized discrete-time autoregressive model that is trainable end-to-end using neural networks modulated by time lags to describe the changes caused by the irregularity and asynchronicity. It is applied to multivariate time-series regression tasks using data provided for Alzheimer's disease progression modeling and intensive care unit (ICU) mortality rate prediction, where the proposed model based on a gated recurrent unit (GRU) achieves the lowest prediction errors among the proposed RNN-based models and state-of-the-art methods using GRUs and long short-term memory (LSTM) networks in their architecture.

CVApr 5, 2021
Multi-Atlas Based Pathological Stratification of d-TGA Congenital Heart Disease

Maria A. Zuluaga, Alex F. Mendelson, M. Jorge Cardoso et al.

One of the main sources of error in multi-atlas segmentation propagation approaches comes from the use of atlas databases that are morphologically dissimilar to the target image. In this work, we exploit the segmentation errors associated with poor atlas selection to build a computer aided diagnosis (CAD) system for pathological classification in post-operative dextro-transposition of the great arteries (d-TGA). The proposed approach extracts a set of features, which describe the quality of a segmentation, and introduces them into a logical decision tree that provides the final diagnosis. We have validated our method on a set of 60 whole heart MR images containing healthy cases and two different forms of post-operative d-TGA. The reported overall CAD system accuracy was of 93.33%.

CVSep 7, 2020
Real-Time Segmentation of Non-Rigid Surgical Tools based on Deep Learning and Tracking

Luis C. García-Peraza-Herrera, Wenqi Li, Caspar Gruijthuijsen et al.

Real-time tool segmentation is an essential component in computer-assisted surgical systems. We propose a novel real-time automatic method based on Fully Convolutional Networks (FCN) and optical flow tracking. Our method exploits the ability of deep neural networks to produce accurate segmentations of highly deformable parts along with the high speed of optical flow. Furthermore, the pre-trained FCN can be fine-tuned on a small amount of medical images without the need to hand-craft features. We validated our method using existing and new benchmark datasets, covering both ex vivo and in vivo real clinical cases where different surgical instruments are employed. Two versions of the method are presented, non-real-time and real-time. The former, using only deep learning, achieves a balanced accuracy of 89.6% on a real clinical dataset, outperforming the (non-real-time) state of the art by 3.8% points. The latter, a combination of deep learning with optical flow tracking, yields an average balanced accuracy of 78.2% across all the validated datasets.

IVJun 28, 2020
Simulation of Brain Resection for Cavity Segmentation Using Self-Supervised and Semi-Supervised Learning

Fernando Pérez-García, Roman Rodionov, Ali Alim-Marvasti et al.

Resective surgery may be curative for drug-resistant focal epilepsy, but only 40% to 70% of patients achieve seizure freedom after surgery. Retrospective quantitative analysis could elucidate patterns in resected structures and patient outcomes to improve resective surgery. However, the resection cavity must first be segmented on the postoperative MR image. Convolutional neural networks (CNNs) are the state-of-the-art image segmentation technique, but require large amounts of annotated data for training. Annotation of medical images is a time-consuming process requiring highly-trained raters, and often suffering from high inter-rater variability. Self-supervised learning can be used to generate training instances from unlabeled data. We developed an algorithm to simulate resections on preoperative MR images. We curated a new dataset, EPISURG, comprising 431 postoperative and 269 preoperative MR images from 431 patients who underwent resective surgery. In addition to EPISURG, we used three public datasets comprising 1813 preoperative MR images for training. We trained a 3D CNN on artificially resected images created on the fly during training, using images from 1) EPISURG, 2) public datasets and 3) both. To evaluate trained models, we calculate Dice score (DSC) between model segmentations and 200 manual annotations performed by three human raters. The model trained on data with manual annotations obtained a median (interquartile range) DSC of 65.3 (30.6). The DSC of our best-performing model, trained with no manual annotations, is 81.7 (14.2). For comparison, inter-rater agreement between human annotators was 84.0 (9.9). We demonstrate a training method for CNNs using simulated resection cavities that can accurately segment real resection cavities, without manual annotations.

CVApr 20, 2020
Combining multimodal information for Metal Artefact Reduction: An unsupervised deep learning framework

Marta B. M. Ranzini, Irme Groothuis, Kerstin Kläser et al.

Metal artefact reduction (MAR) techniques aim at removing metal-induced noise from clinical images. In Computed Tomography (CT), supervised deep learning approaches have been shown effective but limited in generalisability, as they mostly rely on synthetic data. In Magnetic Resonance Imaging (MRI) instead, no method has yet been introduced to correct the susceptibility artefact, still present even in MAR-specific acquisitions. In this work, we hypothesise that a multimodal approach to MAR would improve both CT and MRI. Given their different artefact appearance, their complementary information can compensate for the corrupted signal in either modality. We thus propose an unsupervised deep learning method for multimodal MAR. We introduce the use of Locally Normalised Cross Correlation as a loss term to encourage the fusion of multimodal information. Experiments show that our approach favours a smoother correction in the CT, while promoting signal recovery in the MRI.

IVApr 5, 2020
Automatic Right Ventricle Segmentation using Multi-Label Fusion in Cardiac MRI

Maria A. Zuluaga, M. Jorge Cardoso, Sébastien Ourselin

Accurate segmentation of the right ventricle (RV) is a crucial step in the assessment of the ventricular structure and function. Yet, due to its complex anatomy and motion segmentation of the RV has not been as largely studied as the left ventricle. This paper presents a fully automatic method for the segmentation of the RV in cardiac magnetic resonance images (MRI). The method uses a coarse-to-fine segmentation strategy in combination with a multi-atlas propagation segmentation framework. Based on a cross correlation metric, our method selects the best atlases for propagation allowing the refinement of the segmentation at each iteration of the propagation. The proposed method was evaluated on 32 cardiac MRI datasets provided by the RV Segmentation Challenge in Cardiac MRI.

CVOct 2, 2019
Privacy-preserving Federated Brain Tumour Segmentation

Wenqi Li, Fausto Milletarì, Daguang Xu et al.

Due to medical data privacy regulations, it is often infeasible to collect and share patient data in a centralised data lake. This poses challenges for training machine learning algorithms, such as deep convolutional networks, which often require large numbers of diverse training examples. Federated learning sidesteps this difficulty by bringing code to the patient data owners and only sharing intermediate model training updates among them. Although a high-accuracy model could be achieved by appropriately aggregating these model updates, the model shared could indirectly leak the local training examples. In this paper, we investigate the feasibility of applying differential-privacy techniques to protect the patient data in a federated learning setup. We implement and evaluate practical federated learning systems for brain tumour segmentation on the BraTS dataset. The experimental results show that there is a trade-off between model performance and privacy protection costs.

CVSep 4, 2019
Let's agree to disagree: learning highly debatable multirater labelling

Carole H. Sudre, Beatriz Gomez Anson, Silvia Ingala et al.

Classification and differentiation of small pathological objects may greatly vary among human raters due to differences in training, expertise and their consistency over time. In a radiological setting, objects commonly have high within-class appearance variability whilst sharing certain characteristics across different classes, making their distinction even more difficult. As an example, markers of cerebral small vessel disease, such as enlarged perivascular spaces (EPVS) and lacunes, can be very varied in their appearance while exhibiting high inter-class similarity, making this task highly challenging for human raters. In this work, we investigate joint models of individual rater behaviour and multirater consensus in a deep learning setting, and apply it to a brain lesion object-detection task. Results show that jointly modelling both individual and consensus estimates leads to significant improvements in performance when compared to directly predicting consensus labels, while also allowing the characterization of human-rater consistency.

IVAug 16, 2019
Multi-Domain Adaptation in Brain MRI through Paired Consistency and Adversarial Learning

Mauricio Orbes-Arteaga, Thomas Varsavsky, Carole H. Sudre et al.

Supervised learning algorithms trained on medical images will often fail to generalize across changes in acquisition parameters. Recent work in domain adaptation addresses this challenge and successfully leverages labeled data in a source domain to perform well on an unlabeled target domain. Inspired by recent work in semi-supervised learning we introduce a novel method to adapt from one source domain to $n$ target domains (as long as there is paired data covering all domains). Our multi-domain adaptation method utilises a consistency loss combined with adversarial learning. We provide results on white matter lesion hyperintensity segmentation from brain MRIs using the MICCAI 2017 challenge data as the source domain and two target domains. The proposed method significantly outperforms other domain adaptation baselines.

APAug 14, 2019
Robust parametric modeling of Alzheimer's disease progression

Mostafa Mehdipour Ghazi, Mads Nielsen, Akshay Pai et al.

Quantitative characterization of disease progression using longitudinal data can provide long-term predictions for the pathological stages of individuals. This work studies the robust modeling of Alzheimer's disease progression using parametric methods. The proposed method linearly maps the individual's age to a disease progression score (DPS) and jointly fits constrained generalized logistic functions to the longitudinal dynamics of biomarkers as functions of the DPS using M-estimation. Robustness of the estimates is quantified using bootstrapping via Monte Carlo resampling, and the estimated inflection points of the fitted functions are used to temporally order the modeled biomarkers in the disease course. Kernel density estimation is applied to the obtained DPSs for clinical status classification using a Bayesian classifier. Different M-estimators and logistic functions, including a novel type proposed in this study, called modified Stannard, are evaluated on the data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) for robust modeling of volumetric MRI and PET biomarkers, CSF measurements, as well as cognitive tests. The results show that the modified Stannard function fitted using the logistic loss achieves the best modeling performance with an average normalized MAE of 0.991 across all biomarkers and bootstraps. Applied to the ADNI test set, this model achieves a multiclass AUC of 0.934 in clinical status classification. The obtained results for the proposed model outperform almost all state-of-the-art results in predicting biomarker values and classifying clinical status. Finally, the experiments show that the proposed model, trained using abundant ADNI data, generalizes well to data from the National Alzheimer's Coordinating Center (NACC) with an average normalized MAE of 1.182 and a multiclass AUC of 0.929.

IVJul 25, 2019
Hetero-Modal Variational Encoder-Decoder for Joint Modality Completion and Segmentation

Reuben Dorent, Samuel Joutard, Marc Modat et al.

We propose a new deep learning method for tumour segmentation when dealing with missing imaging modalities. Instead of producing one network for each possible subset of observed modalities or using arithmetic operations to combine feature maps, our hetero-modal variational 3D encoder-decoder independently embeds all observed modalities into a shared latent representation. Missing data and tumour segmentation can be then generated from this embedding. In our scenario, the input is a random subset of modalities. We demonstrate that the optimisation problem can be seen as a mixture sampling. In addition to this, we introduce a new network architecture building upon both the 3D U-Net and the Multi-Modal Variational Auto-Encoder (MVAE). Finally, we evaluate our method on BraTS2018 using subsets of the imaging modalities as input. Our model outperforms the current state-of-the-art method for dealing with missing modalities and achieves similar performance to the subset-specific equivalent networks.

CVApr 1, 2019
Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Hugo J. Kuijf, J. Matthijs Biesbroek, Jeroen de Bresser et al.

Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their method on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge (https://wmh.isi.uu.nl/). Sixty T1+FLAIR images from three MR scanners were released with manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. Segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: (1) Dice similarity coefficient, (2) modified Hausdorff distance (95th percentile), (3) absolute log-transformed volume difference, (4) sensitivity for detecting individual lesions, and (5) F1-score for individual lesions. Additionally, methods were ranked on their inter-scanner robustness. Twenty participants submitted their method for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

CVAug 20, 2018
Simultaneous synthesis of FLAIR and segmentation of white matter hypointensities from T1 MRIs

Mauricio Orbes-Arteaga, M. Jorge Cardoso, Lauge Sørensen et al.

Segmenting vascular pathologies such as white matter lesions in Brain magnetic resonance images (MRIs) require acquisition of multiple sequences such as T1-weighted (T1-w) --on which lesions appear hypointense-- and fluid attenuated inversion recovery (FLAIR) sequence --where lesions appear hyperintense--. However, most of the existing retrospective datasets do not consist of FLAIR sequences. Existing missing modality imputation methods separate the process of imputation, and the process of segmentation. In this paper, we propose a method to link both modality imputation and segmentation using convolutional neural networks. We show that by jointly optimizing the imputation network and the segmentation network, the method not only produces more realistic synthetic FLAIR images from T1-w images, but also improves the segmentation of WMH from T1-w images only.

CVJul 9, 2018
Weakly-Supervised Convolutional Neural Networks for Multimodal Image Registration

Yipeng Hu, Marc Modat, Eli Gibson et al.

One of the fundamental challenges in supervised learning for multimodal image registration is the lack of ground-truth for voxel-level spatial correspondence. This work describes a method to infer voxel-level transformation from higher-level correspondence information contained in anatomical labels. We argue that such labels are more reliable and practical to obtain for reference sets of image pairs than voxel-level correspondence. Typical anatomical labels of interest may include solid organs, vessels, ducts, structure boundaries and other subject-specific ad hoc landmarks. The proposed end-to-end convolutional neural network approach aims to predict displacement fields to align multiple labelled corresponding structures for individual image pairs during the training, while only unlabelled image pairs are used as the network input for inference. We highlight the versatility of the proposed strategy, for training, utilising diverse types of anatomical labels, which need not to be identifiable over all training image pairs. At inference, the resulting 3D deformable image registration algorithm runs in real-time and is fully-automated without requiring any anatomical labels or initialisation. Several network architecture variants are compared for registering T2-weighted magnetic resonance images and 3D transrectal ultrasound images from prostate cancer patients. A median target registration error of 3.6 mm on landmark centroids and a median Dice of 0.87 on prostate glands are achieved from cross-validation experiments, in which 108 pairs of multimodal images from 76 patients were tested with high-quality anatomical labels.

CVJun 13, 2018
Boosted Training of Convolutional Neural Networks for Multi-Class Segmentation

Lorenz Berger, Eoin Hyde, Matt Gibb et al.

Training deep neural networks on large and sparse datasets is still challenging and can require large amounts of computation and memory. In this work, we address the task of performing semantic segmentation on large volumetric data sets, such as CT scans. Our contribution is threefold: 1) We propose a boosted sampling scheme that uses a-posterior error maps, generated throughout training, to focus sampling on difficult regions, resulting in a more informative loss. This results in a significant training speed up and improves learning performance for image segmentation. 2) We propose a novel algorithm for boosting the SGD learning rate schedule by adaptively increasing and lowering the learning rate, avoiding the need for extensive hyperparameter tuning. 3) We show that our method is able to attain new state-of-the-art results on the VISCERAL Anatomy benchmark.

CVJun 8, 2018
VTrails: Inferring Vessels with Geodesic Connectivity Trees

Stefano Moriconi, Maria A. Zuluaga, H. Rolf Jäger et al.

The analysis of vessel morphology and connectivity has an impact on a number of cardiovascular and neurovascular applications by providing patient-specific high-level quantitative features such as spatial location, direction and scale. In this paper we present an end-to-end approach to extract an acyclic vascular tree from angiographic data by solving a connectivity-enforcing anisotropic fast marching over a voxel-wise tensor field representing the orientation of the underlying vascular tree. The method is validated using synthetic and real vascular images. We compare VTrails against classical and state-of-the-art ridge detectors for tubular structures by assessing the connectedness of the vesselness map and inspecting the synthesized tensor field as proof of concept. VTrails performance is evaluated on images with different levels of degradation: we verify that the extracted vascular network is an acyclic graph (i.e. a tree), and we report the extraction accuracy, precision and recall.

CVFeb 28, 2018
Retrieval and Registration of Long-Range Overlapping Frames for Scalable Mosaicking of In Vivo Fetoscopy

Loïc Peter, Marcel Tella-Amo, Dzhoshkun Ismail Shakir et al.

Purpose: The standard clinical treatment of Twin-to-Twin Transfusion Syndrome consists in the photo-coagulation of undesired anastomoses located on the placenta which are responsible to a blood transfer between the two twins. While being the standard of care procedure, fetoscopy suffers from a limited field-of-view of the placenta resulting in missed anastomoses. To facilitate the task of the clinician, building a global map of the placenta providing a larger overview of the vascular network is highly desired. Methods: To overcome the challenging visual conditions inherent to in vivo sequences (low contrast, obstructions or presence of artifacts, among others), we propose the following contributions: (i) robust pairwise registration is achieved by aligning the orientation of the image gradients, and (ii) difficulties regarding long-range consistency (e.g. due to the presence of outliers) is tackled via a bag-of-word strategy, which identifies overlapping frames of the sequence to be registered regardless of their respective location in time. Results: In addition to visual difficulties, in vivo sequences are characterised by the intrinsic absence of gold standard. We present mosaics motivating qualitatively our methodological choices and demonstrating their promising aspect. We also demonstrate semi-quantitatively, via visual inspection of registration results, the efficacy of our registration approach in comparison to two standard baselines. Conclusion: This paper proposes the first approach for the construction of mosaics of placenta in in vivo fetoscopy sequences. Robustness to visual challenges during registration and long-range temporal consistency are proposed, offering first positive results on in vivo data for which standard mosaicking techniques are not applicable.

CVJul 11, 2017
Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations

Carole H Sudre, Wenqi Li, Tom Vercauteren et al.

Deep-learning has proved in recent years to be a powerful tool for image analysis and is now widely used to segment both 2D and 3D medical images. Deep-learning segmentation frameworks rely not only on the choice of network architecture but also on the choice of loss function. When the segmentation process targets rare observations, a severe class imbalance is likely to occur between candidate labels, thus resulting in sub-optimal performance. In order to mitigate this issue, strategies such as the weighted cross-entropy function, the sensitivity function or the Dice loss function, have been proposed. In this work, we investigate the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks. We also propose to use the class re-balancing properties of the Generalized Dice overlap, a known metric for segmentation assessment, as a robust and accurate deep-learning loss function for unbalanced tasks.

MLFeb 18, 2016
What is the distribution of the number of unique original items in a bootstrap sample?

Alex F. Mendelson, Maria A. Zuluaga, Brian F. Hutton et al.

Sampling with replacement occurs in many settings in machine learning, notably in the bagging ensemble technique and the .632+ validation scheme. The number of unique original items in a bootstrap sample can have an important role in the behaviour of prediction models learned on it. Indeed, there are uncontrived examples where duplicate items have no effect. The purpose of this report is to present the distribution of the number of unique original items in a bootstrap sample clearly and concisely, with a view to enabling other machine learning researchers to understand and control this quantity in existing and future resampling techniques. We describe the key characteristics of this distribution along with the generalisation for the case where items come from distinct categories, as in classification. In both cases we discuss the normal limit, and conduct an empirical investigation to derive a heuristic for when a normal approximation is permissible.